-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Unified Push API to offline and online stores #2732
Copy link
Copy link
Closed
Labels
Community Contribution NeededWe want community to contributeWe want community to contributekind/featureNew feature or requestNew feature or requestkind/projectA top level project to be tracked in GitHub ProjectsA top level project to be tracked in GitHub Projectspriority/p1
Description
Problem
It's difficult to keep streaming features (i.e. from Kafka + Spark Streaming) or push features (e.g. computed at request time) consistently available at training + serving time.
With streaming features today, users would need to either:
- write transformed features to both offline and online stores:
- stream 1 -> transform -> stream 2
- stream 2 -> write to offline store
- stream 2 -> write to online store via
feature_store.push(df, push_source)
- use both batch + stream transformations
- stream 1 -> offline store (raw events)
- stream 1 -> transform -> stream 2 -> online store via
feature_store.push(df, push_source)
This issue compounds as data scientists iterate on transformations for training their model, and engineers need to continuously translate this for model serving.
Potential solution
A simple solution may be to allow:
- FeatureView to have an
offline=Trueoption feature_store.pushto also append to an existing table in the offline store (e.g. data warehouse) that matches the feature view name.
Alternatives
- Pushing to the original data source. This works today in Feast if there are no transformations, but given that feature views will soon have transformations (e.g. Batch transformations #2730 or Stream transformations #2597), this would be inconsistent (
feature_store.pushshould be pushing transformed features, not raw data) - Feast ingestion (e.g. submitting jobs to Spark) from a topic to both offline + online store sinks
Appendix
Background
Currently, Feast supports pushing features to the online store (https://docs.feast.dev/reference/data-sources/push).
An example may be:
- Definition of features:
driver_stats_push_source = PushSource( name="driver_stats_push_source", batch_source=driver_stats, ) driver_daily_features_view = FeatureView( name="driver_daily_features", entities=["driver"], ttl=timedelta(seconds=8640000000), schema=[Field(name="daily_miles_driven", dtype=Float32),], online=True, source=driver_stats_push_source, tags={"production": "True"}, owner="test2@gmail.com", )
- Pushing features to the online store (e.g. from Spark)
store.push("driver_stats_push_source", pandas_df)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Community Contribution NeededWe want community to contributeWe want community to contributekind/featureNew feature or requestNew feature or requestkind/projectA top level project to be tracked in GitHub ProjectsA top level project to be tracked in GitHub Projectspriority/p1
Type
Projects
Status
Done