-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Unify Feature Transformations and Feature Views #5747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
2899413 to
649c429
Compare
| from feast.feature_view import FeatureView as FV | ||
|
|
||
| for src in sources: | ||
| if not isinstance(src, (FV, type(src).__name__ == "FeatureView")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if not isinstance(src, (FV, type(src).__name__ == "FeatureView")): | |
| if not isinstance(src, FeatureView): |
sdk/python/feast/feature_view.py
Outdated
| owner: str = "", | ||
| mode: Optional[Union["TransformationMode", str]] = None, | ||
| feature_transformation: Optional[Transformation] = None, | ||
| when: Optional[Union[TransformationTiming, str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need updates to proto ? when and online_enabled not serialized in from_proto.
Also, modify __eq__() and __copy__() for consider these new attributes ?
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
8b578bf to
c77cb83
Compare
|
I think FeatureView as the actually data assets, I think makes more sense to having the API. |
… auto-inference Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Yeah, that's fair. @HaoXuAI I updated the PR, lmk what you think. |
sdk/python/feast/feature_view.py
Outdated
| self.transform_when = transform_when | ||
|
|
||
| # Auto-infer online setting based on transform_when pattern | ||
| if transform_when in [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why batch on read/write forces online to be true?
I think we need to think about if we really need a new tranform_when config. And if we need it, what is its relation with online/offline.
I think it might be counterintuitive that if this featureview is only used by batch or stream or on demand. what if the user defines it today as one mode and wants to change it to another one or allow both mode in the future?
My personal pint of view is removing any config for this. As long as you define the transformation, it will be applied to run before writing to destination. Use online/offline to control the destination only, if both are off, then no write. For any transformation needed on read, this should be configured at API level. Fo example, you can put get_online_feature(transform=true) to force transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to think about if we really need a new tranform_when config. And if we need it, what is its relation with online/offline.
Here's how things actually work:
- Batch
- Sometimes data can be transformed online
- On Reads
- On Writes
- Sometimes not, and that's basically a batch transform with no feature server transformation and pure materialization
- Streaming
- Handles historical data the same way it handles online serving
that's really what the transform_when indicates.
We need something other than online because the rest will be executed by the server and you can't orchestrate the transformation on write during retrieval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with renaming something but we need something equivalent to write_to_online_store that's more obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
online = true means write to online store.
transform_on_write should be removed from the server. it's a hacky way. #5283. this PR doesn't get enough reviews I think.
#5300. I think this one is also not ideal implementation as well.
When you use get_online_feature, you should assign it a transform=true to run transformation in "retrieval".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through the feature server code and on demand feature view implementation. I think a lot code needs to be refactored. And if you think there is some lines of code won't work you can tag me. I'll try to explain it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we agreed on this direction and I'm going to implement things that way. Just need to finish the work.
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
What this PR does / why we need it:
This PR refactors the transformation system with a cleaner architecture that separates transformation logic from execution. Transformations should focus on HOW to transform data, while FeatureViews handle WHEN and WHERE to execute.
Changes:
feature_transformationfield to FeatureView for transformation logictransform: bool = Trueparameter to API methods for per-call controltransform=False)Which issue(s) this PR fixes:
#4584, #5716, #5689
Misc