Skip to content

Materialization fails when field mappings are used to rename entity join keys (SnowflakeSource) #5942

@emreakg

Description

@emreakg

Expected Behavior

Features materialized to online store successfully, with features persisted on the online store using field_mapping values for the entity join key as names.

Current Behavior

Materialization fails before the offline store is queried: KeyError: Index(['USERID'], dtype='object')
Only tested on SnowflakeSource

Steps to reproduce

Define a feature source as

create view schema_name.test as (
    select 
        current_timestamp() as EVENT_TIMESTAMP,
        1 as USERID,
        0.75 as FEATURE_1,
        300 as FEATURE_2
);
user = Entity(name="user", join_keys=["user_id"], value_type=ValueType.STRING)

feature_source = SnowflakeSource(
    table="TEST",
    name="test_features",
    timestamp_field="EVENT_TIMESTAMP",
    database=dbname,
    schema=schema,
    field_mapping={
        "USERID": "user_id",
        "FEATURE_1": "feature_1",
        "FEATURE_2": "feature_2",
    },
)

feature_view = FeatureView(
    name="test_feature_view",
    entities=[user],
    schema=[
        Field(name="feature_1", dtype=types.Float32),
        Field(name="feature_2", dtype=types.Int32),
    ],
    online=True,
    source=feature_source,
)

Specifications

  • Version: 0.58.0
  • Platform: macOS
  • Subsystem: SQLite registry, SnowflakeSource

Inconsistencies

  • get_historical_features already works with the above configuration. Materialization not working in the same way causes inconsistent and confusing behavior.
    • Using user_id (field_mapping value) as an input to get_historical_features) The offline store is queried with "USERID" (field_mapping key), and the return value once again contains user_id.

Discussion

join_keys arg when defining an Entity already provides a way to map offline store column names to feast registry field/key names, for entity keys. If this is considered the de facto way to manage source column names, than this issue can be closed.

However I think its worth exploring. Having the renaming done at entity level means all tables in the offline store needs to use the same column name for that entity key. There might be cases where different teams refer to the same entity with different names, just because the business context is different.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions