-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Summary
Add an opt-in pre-computed feature vector capability to dramatically reduce online serving latency for the Python feature server. Instead of assembling features from multiple feature views at read time, pre-compute and store the complete feature vector as a single serialized blob at write/materialize time.
Motivation
The current get-online-features bottleneck is architectural: each request must resolve feature views from the registry, issue sequential online_read calls per feature view, deserialize individual ValueProto objects per feature, assemble them into a GetOnlineFeaturesResponse protobuf, then convert to JSON via MessageToDict. All of this is O(features × entities) in pure Python.
For latency-sensitive use cases, we need improvement that cannot be achieved by optimizing the existing read path alone.
Proposal
At write time (materialize / push), after writing individual features to the online store as usual, also assemble the complete feature vector for each affected FeatureService and store it as a single serialized blob keyed by (project, entity_key, feature_service_name).
At read time, when the request specifies a FeatureService with pre-computation enabled, bypass the per-feature-view assembly pipeline entirely: perform a single key lookup, deserialize one blob, and return.
Opt-in per FeatureService:
FeatureService(
name="realtime_scoring",
features=[user_fv, txn_fv, product_fv],
precompute_online=True, # new flag, default False
)
Only FeatureServices explicitly marked with precompute_online=True trigger pre-computation. This avoids write amplification for services that don't need it.
For FeatureServices containing ODFVs: pre-compute only the non-ODFV features as a partial vector, then apply ODFV transforms at read time on top of the pre-computed base.