fix: Fix SparkRetrievalJob.persist() failing for SparkSource by ntkathole · Pull Request #6410 · feast-dev/feast

ntkathole · 2026-05-16T15:58:45Z

What this PR does / why we need it:

Fixes #6261

SparkRetrievalJob.persist() failed in two scenarios:

Remote offline store path: When using type: remote in feature_store.yaml pointing to a Spark offline server, the server calls SavedDatasetStorage.from_data_source(data_source) to convert the registered SparkSource into storage. This raised ValueError because SparkSource was not registered in the _DATA_SOURCE_TO_SAVED_DATASET_STORAGE mapping, and SavedDatasetSparkStorage lacked a from_data_source() method.
Path-based SparkSource: When using a path-based SparkSource (e.g., S3 with parquet), persist() required a table name and raised ValueError if one wasn't provided, even though the storage had a valid path configured.

Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>

ntkathole self-assigned this May 16, 2026

ntkathole requested a review from a team as a code owner May 16, 2026 15:58

ntkathole force-pushed the fix_6261 branch from 89cf1a4 to 49e3ca0 Compare May 16, 2026 16:01

ntkathole added the ok-to-test label May 16, 2026

ntkathole force-pushed the fix_6261 branch 2 times, most recently from 83c73bc to b713e32 Compare June 9, 2026 06:08

ntkathole force-pushed the fix_6261 branch from b713e32 to bfa9e36 Compare June 12, 2026 13:55

franciscojavierarceo approved these changes Jun 12, 2026

View reviewed changes

fix: Fix SparkRetrievalJob.persist() failing for SparkSource

fd1752b

Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>

ntkathole force-pushed the fix_6261 branch from bfa9e36 to fd1752b Compare June 12, 2026 15:55

ntkathole merged commit 209d7cd into feast-dev:master Jun 12, 2026
22 of 25 checks passed