-
Notifications
You must be signed in to change notification settings - Fork 324
Description
Is your feature request related to a problem? Please describe.
The Polars DataFrame library has been gaining a lot of traction and many are writing new pipelines in Polars and/or moving from pandas to Polars. It would be great to add native support between the BigQuery client library and Polars.
Describe the solution you'd like
This request is to allow inserting data directly from a Polars DataFrame into a BigQuery table.
An additional bonus would be not requiring PyArrow to be installed.
I would be open to expanding client.load_table_from_dataframe to also accept Polars DataFrames, or new dedicated method(s) being created.
Describe alternatives you've considered
-
Convert to pandas at the end of the pipeline and use
client.load_table_from_dataframeto insert the data. Not ideal to require an additional dependency just to insert data. Furthermore, I don't believe that pandas supports complex types available in both BigQuery and Polars, such as structs and arrays. -
Write the DataFrame to a bytes stream as a parquet file and insert the data with
client.load_table_from_file. This intent of this code is a lot less obvious, and it would be much nicer to have more native support. Note that this is also the suggested approach in the Polars user guide (rightfully so IMO as it does not require any additional dependencies). -
Do not support Polars directly, but instead support inserting data from a PyArrow table. This is not currently feasible, but would be an alternative feature request. This is not preferable as the option above already allows inserting data without a PyArrow dependency. From looking at the docs (haven't checked the source), this potentially looks to have some overlap with what
client.load_table_from_dataframealready does.