-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
Description
When using a FileSource that is in Parquet format, if the source happens to be a directory of partitioned Parquet files, the following lines throw an error:
feast/sdk/python/feast/infra/offline_stores/file_source.py
Lines 182 to 184 in 01d3568
| schema = ParquetFile( | |
| path if filesystem is None else filesystem.open_input_file(path) | |
| ).schema_arrow |
OSError: Expected file path, but /home/ubuntu/project/data/driver_stats_partitioned is a directory
How to replicate:
- Start with a demo feast project (
feast init) - Create a partitioned Parquet Dataset. Use the following to create a dataset with only a single timestamp for inference
import pyarrow.parquet as pq
df = pq.read_table("./data/driver_stats.parquet")
df = df.drop(["created"])
pq.write_to_dataset(df, "./data/driver_stats_partitioned")
- Update the file source in
example.pyto look like this:
driver_hourly_stats = FileSource(
path="/home/ubuntu/cado-feast/feature_store/exciting_sunbeam/data/driver_stats_partitioned2",
)
- Run
feast apply
For now, I've been able to fix by updating the above lines to:
schema = ParquetDataset(
path if filesystem is None else filesystem.open_input_file(path)
).schema.to_arrow_schema()