2

I have some coordinate data; some of it high precision, some of it low precision thanks to multiple data sources and other operational realities. I want to have a column that indicates the relative precision of the coordinates. So far, what I want is to essentially count digits after the decimal; in my case more digits indicates higher precision data. In my case I usually get data like the data in the example; its either coming with five to six digits precision or just one digit. Both are useful; but we can do more analysis on higher precision data as you may imagine.

This code does what I want, but it seems .... wordy, inelegant; as if I'm being paid by the line of code. Is there a simpler way to do this?

import polars as pl

df = pl.DataFrame(
    {
            "lat":  [ 43.6425047,  43.6,  40.688966,  40.6],
            "lng":  [-79.3861057, -79.3, -74.044438, -74.0],
    }
)

df = (df.with_columns(
    pl.col("lat").cast(pl.String)
    .str.split_exact(".", 1)
    .struct.rename_fields(["lat_major", "lat_minor"])
    .alias("lat_temp"))
 .unnest("lat_temp")
 .with_columns(
    pl.col("lat_minor")
    .str.len_chars()
    .alias("lat_precision"))
 .drop("lat_major", "lat_minor")
 .with_columns(
    pl.col("lng").cast(pl.String)
    .str.split_exact(".", 1)
    .struct.rename_fields(["lng_major", "lng_minor"])
    .alias("lng_temp"))
    .unnest("lng_temp")
 .with_columns(
    pl.col("lng_minor")
    .str.len_chars()
    .alias("lng_precision"))
 .drop("lng_major", "lng_minor")
 .with_columns(
    pl.col("lat_precision")
    .add(pl.col("lng_precision"))
    .alias("precision"))
 .drop("lat_precision", "lng_precision")
 )
df.head()

results in

shape: (4, 3)
┌───────────┬────────────┬───────────┐
│ lat       ┆ lng        ┆ precision │
│ ---       ┆ ---        ┆ ---       │
│ f64       ┆ f64        ┆ u32       │
╞═══════════╪════════════╪═══════════╡
│ 43.642505 ┆ -79.386106 ┆ 14        │
│ 43.6      ┆ -79.3      ┆ 2         │
│ 40.688966 ┆ -74.044438 ┆ 12        │
│ 40.6      ┆ -74.0      ┆ 2         │
└───────────┴────────────┴───────────┘

later I might pull out records with precision over 5, for instance, as my source data tends to be either one decimal point precision or four+ decimal points precision per coordinate.

1 Answer 1

1

You can extract the minor fields directly without the need for temp columns and unnesting.

df.with_columns(
    pl.col("lat", "lng").cast(pl.String)
      .str.split_exact(".", 1)
      .struct.field("field_1")
      .str.len_chars()
      .name.suffix("_minor")
)
shape: (4, 4)
┌───────────┬────────────┬───────────┬───────────┐
│ lat       ┆ lng        ┆ lat_minor ┆ lng_minor │
│ ---       ┆ ---        ┆ ---       ┆ ---       │
│ f64       ┆ f64        ┆ u32       ┆ u32       │
╞═══════════╪════════════╪═══════════╪═══════════╡
│ 43.642505 ┆ -79.386106 ┆ 7         ┆ 7         │
│ 43.6      ┆ -79.3      ┆ 1         ┆ 1         │
│ 40.688966 ┆ -74.044438 ┆ 6         ┆ 6         │
│ 40.6      ┆ -74.0      ┆ 1         ┆ 1         │
└───────────┴────────────┴───────────┴───────────┘

We're using a single pl.col("lat", "lng") call here which will go through an "expansion" step, i.e.

pl.col("lat", "lng").foo().bar()

is expanded into individual expressions.

pl.col("lat").foo().bar(),
pl.col("lng").foo().bar()

pl.sum_horizontal() can be used if you just want the totals.

df.with_columns(
    pl.sum_horizontal(
        pl.col("lat", "lng").cast(pl.String)
          .str.split_exact(".", 1)
          .struct.field("field_1")
          .str.len_chars()
    )
    .alias("precision")
)
shape: (4, 3)
┌───────────┬────────────┬───────────┐
│ lat       ┆ lng        ┆ precision │
│ ---       ┆ ---        ┆ ---       │
│ f64       ┆ f64        ┆ u32       │
╞═══════════╪════════════╪═══════════╡
│ 43.642505 ┆ -79.386106 ┆ 14        │
│ 43.6      ┆ -79.3      ┆ 2         │
│ 40.688966 ┆ -74.044438 ┆ 12        │
│ 40.6      ┆ -74.0      ┆ 2         │
└───────────┴────────────┴───────────┘
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.