-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Expected Behavior
When retrieving documents from an online store we don't care about the embedding after the top k retrieval we only care about the document ID, or some other metadata.
The current behavior doesn't allow retrieval of any field other than the "embedding" field, which is problematic.
Current Behavior
Can't retrieve the field other then the
dtype=Array(Float32),
vector_index=True,
field.
Steps to reproduce
Create a feature view with 2 fields:
item_embedding_view = FeatureView(
name="item_embedding",
entities=["item_id"],
ttl=timedelta(days=365 * 5),
schema=[
Field(name="item_id", dtype=Int64),
Field(
name="embedding",
dtype=Array(Float32),
vector_index=True,
vector_search_metric="cosine",
),
],
source=item_embed_push_source,
online=True
)
And push a DataFrame to the online store
item_embed_df:
item_id embedding event_timestamp
0 167 [0.12210895866155624, -0.11489687114953995, -1... 2025-03-17 09:53:12.138190
1 1003 [-0.9383772015571594, -1.7450543642044067, 0.7... 2025-03-17 09:53:12.138190
2 4895 [-0.9383772015571594, -1.7450543642044067, 0.7... 2025-03-17 09:53:12.138190
3 3812 [-0.8219010829925537, 0.1411619633436203, 1.12... 2025-03-17 09:53:12.138190
4 4941 [-0.6469292044639587, -1.1105291843414307, -0.... 2025-03-17 09:53:12.138190
... ... ... ...
4995 1216 [0.12210895866155624, -0.11489687114953995, -1... 2025-03-17 09:53:12.138190
4996 4198 [-0.6469292044639587, -1.1105291843414307, -0.... 2025-03-17 09:53:12.138190
4997 112 [0.5381854176521301, 0.2744300961494446, -1.46... 2025-03-17 09:53:12.138190
4998 2948 [-0.8219010829925537, 0.1411619633436203, 1.12... 2025-03-17 09:53:12.138190
4999 4708 [-0.8219010829925537, 0.1411619633436203, 1.12... 2025-03-17 09:53:12.138190
Push and materialize:
store.push('item_embed_push_source', item_embed_df, to=PushMode.ONLINE)
store.materialize_incremental(datetime.now(), feature_views=['item_embedding'])
Try to retrieve item_embedding:item_id' when using the code:
store.retrieve_online_documents(
query=user_embed,
top_k=64,
feature='item_embedding:item_id'
).to_df()
Returns an empty Dataframe item_id distance when trying to retrieve entity field, and returns an empty list when trying to retrieve other fields (assume I added it to the FV)
other_field distance
0 [] None
1 [] None
... ... ...
62 [] None
63 [] None
NOTE: If I call the feature 'item_embedding:embedding' instead of 'item_embedding:item_id, I will get results.
Specifications
- Version: '0.46.0'
- Platform: PGvector locally
- Subsystem: Fedora
Possible Solution
Add the options to do it