-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
kind/featureNew feature or requestNew feature or request
Description
Is your feature request related to a problem? Please describe.
The current spark implementation scans over all parquet files. This process can be made faster and more efficient by specifying a date_partition_column. During execution, this column would be used to filter the data at a file level. Only files who's date is within the range would be scanned.
Describe the solution you'd like
Add date_partition_column to SparkSource. A similar implementation exists for the AthenaSource
Describe alternatives you've considered
None
I have implemented this locally and it works. I'm happy to open a PR
franciscojavierarceofranciscojavierarceo and Brett-ML
Metadata
Metadata
Assignees
Labels
kind/featureNew feature or requestNew feature or request