Skip to content

On demand feature views (ODFVs) should use support python dicts #2261

@adchia

Description

@adchia

In some test benchmarks, using regular python dicts for inputs for executing the transformations is much faster (up to ~10x) than pandas for the online flow. This tends to be the more latency sensitive flow (offline flows seem to be ~40% slower if using vectorized operations).

Something that looks like:

@on_demand_feature_view(
    sources=[driver_hourly_stats_view, val_to_add_request],
    schema=[
        Field(name="conv_rate_plus_val1", dtype=Float64),
        Field(name="conv_rate_plus_val2", dtype=Float64),
    ],
    mode="python"
)
def transformed_conv_rate(driver_hourly_stats: Dict[str, Any], vals_to_add: Dict[str, Any]) -> Dict[str, Any]:
    features = {}
    features['conv_rate_plus_val1'] = (driver_hourly_stats['conv_rate'] + vals_to_add['val_to_add'])
    features['conv_rate_plus_val2'] = (driver_hourly_stats['conv_rate'] + vals_to_add['val_to_add_2'])
    return features

might be similar to what we want

Metadata

Metadata

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions