-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Add UUID and TIME_UUID as feature types (#5885) #5951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: Add UUID and TIME_UUID as feature types (#5885) #5951
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1d4cd01 to
4a5c932
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 OnlineResponse.to_arrow() crashes with ArrowInvalid when response contains UUID features
The to_dict() method in OnlineResponse now returns uuid.UUID objects for UUID/TIME_UUID features (via the updated feast_value_type_to_python_type at sdk/python/feast/type_map.py:132-133). However, to_arrow() at sdk/python/feast/online_response.py:107 passes this dict directly to pa.Table.from_pydict(), which internally calls pa.array(). PyArrow does not natively support uuid.UUID objects and will raise ArrowInvalid: Could not convert UUID('...') with type UUID.
Root Cause and Impact
Before this PR, UUID values were stored as string_val and deserialized as plain str by feast_value_type_to_python_type. After this PR, the dedicated uuid_val proto field causes deserialization to uuid.UUID objects. While to_dict() and to_df() (pandas handles uuid.UUID as object dtype) work correctly, to_arrow() breaks because PyArrow has no built-in conversion for uuid.UUID.
This also breaks the ODFV pandas/substrait transformation path in _augment_response_with_on_demand_transforms (sdk/python/feast/utils.py:694):
if initial_response_arrow is None:
initial_response_arrow = initial_response.to_arrow() # crashes hereAny OnDemandFeatureView with mode="pandas" or mode="substrait" whose source feature view contains UUID features will fail at serving time.
Actual: pa.Table.from_pydict({"uuid_col": [uuid.UUID("...")]}) raises ArrowInvalid.
Expected: UUID values should be converted to strings before creating the Arrow table, consistent with the mapping Uuid: pyarrow.string() defined in sdk/python/feast/types.py:270.
(Refers to lines 99-107)
Was this helpful? React with 👍 or 👎 to provide feedback.
|
@soooojinlee , thanks so much for putting this together! Can you rebase to bring this PR up to date? |
Signed-off-by: soojin <soojin@dable.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add uuid_val, time_uuid_val, uuid_list_val, time_uuid_list_val as dedicated oneof fields in the Value proto message, replacing the previous reuse of string_val/string_list_val. This allows UUID types to be identified from the proto field alone without requiring a feature_types side-channel. Backward compatibility is maintained for data previously stored as string_val. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: soojin <soojin@dable.io>
2c56521 to
1fd106a
Compare
What this PR does / why we need it:
Adds
UUIDandTIME_UUIDas native Feast feature types, resolving #5885. Currently UUID values must be stored as STRING, which loses type semantics, prevents backend-specific features (e.g. Cassandra timeuuid range queries), and makes PostgreSQLuuidcolumns infer as STRING. This PR enables users to declare UUID features withField(name="user_id", dtype=Uuid)and receiveuuid.UUIDobjects fromget_online_features().to_dict().Design Decisions
Why two types (UUID vs TIME_UUID)?
The issue author explicitly requested distinguishing time-based UUID (uuid1) and random UUID (uuid4). Both serialize
identically to
stringin proto, but separate types allow expressing intent in feature definitions and enable future backend-specific optimizations.Why dedicated proto fields (
uuid_val,time_uuid_val)?Following the pattern established by SET types (PR #5888) and UNIX_TIMESTAMP (which reuses
int64/Int64List), we add dedicated oneof fields that reuse existing proto scalar types (stringandStringList). This allowsWhichOneof("val")to identify UUID types directly from the proto message, without requiring a side-channel.Backward compatibility for data stored before this change:
OnlineResponseaccepts an optionalfeature_typesdict. When data was previously stored asstring_val, this metadata enablesfeast_value_type_to_python_type()to convert it touuid.UUID. New materializations useuuid_val/time_uuid_valand are identified automatically.Changes
Value.proto, generated*_pb2.py/*_pb2.pyiUUID=30,TIME_UUID=31,UUID_LIST=32,TIME_UUID_LIST=33toValueType.Enum; adduuid_val,time_uuid_val,uuid_list_val,time_uuid_list_valtoValue.oneofvalue_type.py,types.pyUUID,TIME_UUID,UUID_LIST,TIME_UUID_LISTenums andUuid/TimeUuidaliasestype_map.pystring_valtouuid_val; addPROTO_VALUE_TO_VALUE_TYPE_MAPentries for UUID fieldsonline_response.py,online_store.py,feature_store.py,utils.pyfeature_typesmetadata for backward-compatible deserializationon_demand_feature_view.pyBackward Compatibility
string_valstill deserializes correctly via thefeature_typesside-channeluuid_val/time_uuid_valproto fieldsfeast_value_type_to_python_type(v)withoutfeature_typenow returnsuuid.UUIDforuuid_valfields (previously returned plain string forstring_val)ValueType.UUID(previouslyValueType.STRING)Tests
test_types.py: Uuid/TimeUuid ↔ ValueType bidirectional conversion, Array typestest_type_map.py: Proto roundtrip withuuid_val,uuid.UUIDobject return, backward compatibility forstring_val, UUID list roundtrip, PostgreSQL mapping