Issue
I have data in form of a list of dicts (see MRE below). To make everything type strict I would always like to pass in the expected schema (dtypes) when I read in this data. This option is given in the pl.DataFrame constructor with either schema or schema_overrides. However I frequently run into trouble with the Datetime columns in the schema. Especially when they presented as strings in the dictionaries
Traceback
polars.exceptions.ComputeError: could not append value: "2020-02-11" of type: str to the builder; make sure that all rows have the same schema or consider increasing `infer_schema_length`
Question
Is there a way to "automatically" parse datetime strings when I construct the Dataframe (or use the pl.from_dicts() method)? Something comparable to the solution for data that is present as timestamps (int) in the dictionary of the data implemented early 2024 (github issue)?
Is there something similar for date information present as string (e.g. 2022-01-01)?
Or do I have to drop from my schema_override every pl.Datetime key and then later on convert this manually via
with_columns(pl.col(list_dropped_datetime_cols).cast(pl.Datetime))
MRE
import polars as pl
schema_override = {
"some_int_override": pl.Int8,
"some_date_override": pl.Datetime,
}
dict_data = [
{
"some_int_override": 1,
"some_date_override": "2020-02-11",
"some_date": "2025-02-11",
}
]
df_naiive = pl.DataFrame(dict_data)
print(df_naiive)
df_schema_override = pl.DataFrame(dict_data, schema_overrides=schema_override)
print(df_schema_override)
pl.read_csv(df_naiive.write_csv().encode(), schema_overrides=schema_override)- The manual cast() also gives me a ComputeError but.str.to_datetime()works. (It seems temporal cast is going to be deprecated github.com/pola-rs/polars/issues/23363)pl.read_csv. Do you think it is worth making this a github issue? With maybe an enhancement proposal? I know more people that have the same problem, e.g. they have a pipeline with multiple containers that need to access Data and just want to have a dtype map to reconstruct the data types from the previous step.DataFrame()behaves. I guess people must be using thewith_columnsapproach if it hasn't been raised before.schema=behaviour is expected forDataFrame()