Skip to content

Fetching online features with the wrong entities is not validated #3270

@chhabrakadabra

Description

@chhabrakadabra

Expected Behavior

When using feature_store.get_online_features, if the wrong entity (one that exists but has nothing to do with the features being requested) is used, it leads to an unexpected crash instead of a validation failure. I would expect this edge case to be caught be some validation logic ahead of time and for the error message that pops up to explain the problem.

Current Behavior

Currently it leads to this error message that does not explain to the user what the actual problem is:

Traceback (most recent call last):
  File "bug_repro.py", line 5, in <module>
    feature_vector = store.get_online_features(
  File "/Users/abhin/src/github.com/chhabrakadabra/feast/.venv38/lib/python3.8/site-packages/feast/usage.py", line 294, in wrapper
    raise exc.with_traceback(traceback)
  File "/Users/abhin/src/github.com/chhabrakadabra/feast/.venv38/lib/python3.8/site-packages/feast/usage.py", line 283, in wrapper
    return func(*args, **kwargs)
  File "/Users/abhin/src/github.com/chhabrakadabra/feast/.venv38/lib/python3.8/site-packages/feast/feature_store.py", line 1588, in get_online_features
    return self._get_online_features(
  File "/Users/abhin/src/github.com/chhabrakadabra/feast/.venv38/lib/python3.8/site-packages/feast/feature_store.py", line 1775, in _get_online_features
    table_entity_values, idxs = self._get_unique_entities(
  File "/Users/abhin/src/github.com/chhabrakadabra/feast/.venv38/lib/python3.8/site-packages/feast/feature_store.py", line 1972, in _get_unique_entities
    unique_entities, indexes = tuple(
ValueError: not enough values to unpack (expected 2, got 0)

Steps to reproduce

  • feast init my_project
  • cd my_project/feature_repo
  • Change the example_repo.py file to the following contents:
from datetime import timedelta

from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, Int64

# Define an entity for the driver. You can think of an entity as a primary key used to
# fetch features.
driver = Entity(name="driver", join_keys=["driver_id"])
customer = Entity(name="customer", join_keys=["customer_id"])

# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_stats_source = FileSource(
    name="driver_hourly_stats_source",
    path="/Users/abhin/src/github.com/chhabrakadabra/feast/my_project/feature_repo/data/driver_stats.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_stats_fv = FeatureView(
    # The unique name of this feature view. Two feature views in a single
    # project cannot have the same name
    name="driver_hourly_stats",
    entities=[driver],
    ttl=timedelta(days=1),
    # The list of features defined below act as a schema to both define features
    # for both materialization of features into a store, and are used as references
    # during retrieval for building a training dataset or serving features
    schema=[
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64),
    ],
    online=True,
    source=driver_stats_source,
    # Tags are user defined key/value pairs that are attached to each
    # feature view
    tags={"team": "driver_performance"},
)
  • feast apply
  • feast materialize '2018-01-01T00:00:00' '2022-10-04T17:26:07'
  • Create a file (I called it bug_repro.py) with the following contents:
from feast import FeatureStore

if __name__ == "__main__":
    store = FeatureStore()
    feature_vector = store.get_online_features(
        features=[
            "driver_hourly_stats:conv_rate",
            "driver_hourly_stats:acc_rate",
            "driver_hourly_stats:avg_daily_trips",
        ],
        entity_rows=[
            {"customer_id": 1004},
        ],
    ).to_dict()
    print(feature_vector)
  • Note that the entity rows provide customer_ids, which have nothing to do with the features being requested.
  • Run this file to reproduce the issue.

Specifications

  • Version: 0.25.1
  • Platform: osx
  • Subsystem:

Possible Solution

We need to gather the set of all entities associated with all the features being requested and make sure that those entities are present in the entity_rows. If not, we need to raise a KeyError or something along with an explanation of the missing entities.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions