Skip to content

Conversation

@soooojinlee
Copy link

@soooojinlee soooojinlee commented Feb 8, 2026

What this PR does / why we need it:

Adds UUID and TIME_UUID as native Feast feature types, resolving #5885. Currently UUID values must be stored as STRING, which loses type semantics, prevents backend-specific features (e.g. Cassandra timeuuid range queries), and makes PostgreSQL uuid columns infer as STRING. This PR enables users to declare UUID features with Field(name="user_id", dtype=Uuid) and receive uuid.UUID objects from get_online_features().to_dict().

Design Decisions

Why two types (UUID vs TIME_UUID)?
The issue author explicitly requested distinguishing time-based UUID (uuid1) and random UUID (uuid4). Both serialize
identically to string in proto, but separate types allow expressing intent in feature definitions and enable future backend-specific optimizations.

Why dedicated proto fields (uuid_val, time_uuid_val)?
Following the pattern established by SET types (PR #5888) and UNIX_TIMESTAMP (which reuses int64/Int64List), we add dedicated oneof fields that reuse existing proto scalar types (string and StringList). This allows WhichOneof("val") to identify UUID types directly from the proto message, without requiring a side-channel.

Backward compatibility for data stored before this change:
OnlineResponse accepts an optional feature_types dict. When data was previously stored as string_val, this metadata enables feast_value_type_to_python_type() to convert it to uuid.UUID. New materializations use uuid_val/time_uuid_val and are identified automatically.

Changes

Layer Files Description
Proto Value.proto, generated *_pb2.py/*_pb2.pyi Add UUID=30, TIME_UUID=31, UUID_LIST=32, TIME_UUID_LIST=33 to ValueType.Enum; add uuid_val, time_uuid_val, uuid_list_val, time_uuid_list_val to Value.oneof
Type system value_type.py, types.py Add UUID, TIME_UUID, UUID_LIST, TIME_UUID_LIST enums and Uuid/TimeUuid aliases
Type conversion type_map.py Add mappings to ~11 conversion dicts (proto, PyArrow, pandas, PostgreSQL, Couchbase, Snowflake); switch from string_val to uuid_val; add PROTO_VALUE_TO_VALUE_TYPE_MAP entries for UUID fields
Online response online_response.py, online_store.py, feature_store.py, utils.py Pass feature_types metadata for backward-compatible deserialization
ODFV on_demand_feature_view.py Add UUID/TIME_UUID sample values for schema inference

Backward Compatibility

  • Data previously stored as string_val still deserializes correctly via the feature_types side-channel
  • New materializations use dedicated uuid_val/time_uuid_val proto fields
  • feast_value_type_to_python_type(v) without feature_type now returns uuid.UUID for uuid_val fields (previously returned plain string for string_val)
  • PostgreSQL uuid columns now infer as ValueType.UUID (previously ValueType.STRING)
  • Go SDK: proto changes compile without errors; UUID handling logic is not implemented (out of scope)

Tests

  • test_types.py: Uuid/TimeUuid ↔ ValueType bidirectional conversion, Array types
  • test_type_map.py: Proto roundtrip with uuid_val, uuid.UUID object return, backward compatibility for string_val, UUID list roundtrip, PostgreSQL mapping
  • All 78 unit tests passing
  • ruff lint and format checks passing

@soooojinlee soooojinlee requested review from a team as code owners February 8, 2026 14:55
@soooojinlee soooojinlee requested review from ejscribner, robhowley and tokoko and removed request for a team February 8, 2026 14:56
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 6 additional findings in Devin Review.

Open in Devin Review

@soooojinlee soooojinlee force-pushed the feat/add-uuid-feature-types branch from 1d4cd01 to 4a5c932 Compare February 8, 2026 15:00
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 12 additional findings in Devin Review.

Open in Devin Review

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 15 additional findings in Devin Review.

Open in Devin Review

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 15 additional findings in Devin Review.

Open in Devin Review

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 16 additional findings in Devin Review.

Open in Devin Review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 OnlineResponse.to_arrow() crashes with ArrowInvalid when response contains UUID features

The to_dict() method in OnlineResponse now returns uuid.UUID objects for UUID/TIME_UUID features (via the updated feast_value_type_to_python_type at sdk/python/feast/type_map.py:132-133). However, to_arrow() at sdk/python/feast/online_response.py:107 passes this dict directly to pa.Table.from_pydict(), which internally calls pa.array(). PyArrow does not natively support uuid.UUID objects and will raise ArrowInvalid: Could not convert UUID('...') with type UUID.

Root Cause and Impact

Before this PR, UUID values were stored as string_val and deserialized as plain str by feast_value_type_to_python_type. After this PR, the dedicated uuid_val proto field causes deserialization to uuid.UUID objects. While to_dict() and to_df() (pandas handles uuid.UUID as object dtype) work correctly, to_arrow() breaks because PyArrow has no built-in conversion for uuid.UUID.

This also breaks the ODFV pandas/substrait transformation path in _augment_response_with_on_demand_transforms (sdk/python/feast/utils.py:694):

if initial_response_arrow is None:
    initial_response_arrow = initial_response.to_arrow()  # crashes here

Any OnDemandFeatureView with mode="pandas" or mode="substrait" whose source feature view contains UUID features will fail at serving time.

Actual: pa.Table.from_pydict({"uuid_col": [uuid.UUID("...")]}) raises ArrowInvalid.

Expected: UUID values should be converted to strings before creating the Arrow table, consistent with the mapping Uuid: pyarrow.string() defined in sdk/python/feast/types.py:270.

(Refers to lines 99-107)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@nquinn408
Copy link
Contributor

@soooojinlee , thanks so much for putting this together! Can you rebase to bring this PR up to date?

soooojinlee and others added 7 commits February 11, 2026 10:14
Signed-off-by: soojin <soojin@dable.io>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add uuid_val, time_uuid_val, uuid_list_val, time_uuid_list_val as
dedicated oneof fields in the Value proto message, replacing the
previous reuse of string_val/string_list_val. This allows UUID types
to be identified from the proto field alone without requiring a
feature_types side-channel. Backward compatibility is maintained for
data previously stored as string_val.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
@ntkathole ntkathole force-pushed the feat/add-uuid-feature-types branch from 2c56521 to 1fd106a Compare February 11, 2026 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants