Skip to content

fix: Fix SparkRetrievalJob.persist() failing for SparkSource#6410

Merged
ntkathole merged 1 commit into
feast-dev:masterfrom
ntkathole:fix_6261
Jun 12, 2026
Merged

fix: Fix SparkRetrievalJob.persist() failing for SparkSource#6410
ntkathole merged 1 commit into
feast-dev:masterfrom
ntkathole:fix_6261

Conversation

@ntkathole

Copy link
Copy Markdown
Member

What this PR does / why we need it:

Fixes #6261

SparkRetrievalJob.persist() failed in two scenarios:

  1. Remote offline store path: When using type: remote in feature_store.yaml pointing to a Spark offline server, the server calls SavedDatasetStorage.from_data_source(data_source) to convert the registered SparkSource into storage. This raised ValueError because SparkSource was not registered in the _DATA_SOURCE_TO_SAVED_DATASET_STORAGE mapping, and SavedDatasetSparkStorage lacked a from_data_source() method.

  2. Path-based SparkSource: When using a path-based SparkSource (e.g., S3 with parquet), persist() required a table name and raised ValueError if one wasn't provided, even though the storage had a valid path configured.

@ntkathole ntkathole self-assigned this May 16, 2026
@ntkathole ntkathole requested a review from a team as a code owner May 16, 2026 15:58
@ntkathole ntkathole force-pushed the fix_6261 branch 2 times, most recently from 83c73bc to b713e32 Compare June 9, 2026 06:08
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
@ntkathole ntkathole merged commit 209d7cd into feast-dev:master Jun 12, 2026
22 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SparkRetrievalJob.persist() fails due to missing SparkSource mapping in SavedDatasetStorage.from_data_source

2 participants