Skip to content

Cant retrieve the entity field #5156

@itay1551

Description

@itay1551

Expected Behavior

When retrieving documents from an online store we don't care about the embedding after the top k retrieval we only care about the document ID, or some other metadata.

The current behavior doesn't allow retrieval of any field other than the "embedding" field, which is problematic.

Current Behavior

Can't retrieve the field other then the

            dtype=Array(Float32),
            vector_index=True,

field.

Steps to reproduce

Create a feature view with 2 fields:

item_embedding_view = FeatureView(
    name="item_embedding",
    entities=["item_id"],
    ttl=timedelta(days=365 * 5),
    schema=[
        Field(name="item_id", dtype=Int64),
        Field(
            name="embedding",
            dtype=Array(Float32),
            vector_index=True,
            vector_search_metric="cosine",
        ),
    ],
    source=item_embed_push_source,
    online=True
)

And push a DataFrame to the online store
item_embed_df:

item_id	embedding	event_timestamp
0	167	[0.12210895866155624, -0.11489687114953995, -1...	2025-03-17 09:53:12.138190
1	1003	[-0.9383772015571594, -1.7450543642044067, 0.7...	2025-03-17 09:53:12.138190
2	4895	[-0.9383772015571594, -1.7450543642044067, 0.7...	2025-03-17 09:53:12.138190
3	3812	[-0.8219010829925537, 0.1411619633436203, 1.12...	2025-03-17 09:53:12.138190
4	4941	[-0.6469292044639587, -1.1105291843414307, -0....	2025-03-17 09:53:12.138190
...	...	...	...
4995	1216	[0.12210895866155624, -0.11489687114953995, -1...	2025-03-17 09:53:12.138190
4996	4198	[-0.6469292044639587, -1.1105291843414307, -0....	2025-03-17 09:53:12.138190
4997	112	[0.5381854176521301, 0.2744300961494446, -1.46...	2025-03-17 09:53:12.138190
4998	2948	[-0.8219010829925537, 0.1411619633436203, 1.12...	2025-03-17 09:53:12.138190
4999	4708	[-0.8219010829925537, 0.1411619633436203, 1.12...	2025-03-17 09:53:12.138190

Push and materialize:

store.push('item_embed_push_source', item_embed_df, to=PushMode.ONLINE)
store.materialize_incremental(datetime.now(), feature_views=['item_embedding'])

Try to retrieve item_embedding:item_id' when using the code:

store.retrieve_online_documents(
    query=user_embed,
    top_k=64,
    feature='item_embedding:item_id'
).to_df()

Returns an empty Dataframe item_id distance when trying to retrieve entity field, and returns an empty list when trying to retrieve other fields (assume I added it to the FV)

other_field	distance
0	[]	None
1	[]	None
...	...	...
62	[]	None
63	[]	None

NOTE: If I call the feature 'item_embedding:embedding' instead of 'item_embedding:item_id, I will get results.

Specifications

  • Version: '0.46.0'
  • Platform: PGvector locally
  • Subsystem: Fedora

Possible Solution

Add the options to do it

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions