Skip to content

Support Data Labeling and LabelViews #5456

@franciscojavierarceo

Description

@franciscojavierarceo

Is your feature request related to a problem? Please describe.
Historically, Feast has played a key role in feature development. Particularly around dataset preparation for model development and feature serving for online inference.

Pictorially, you can think of it like this:

Image

Yet labels are the core piece of a training dataset that makes model training successful. Without labels, features are a waste of time (excluding semi/self-supervised learning).

Given the work with compute engine, my proposal is to expand Feast to include the entire Training Dataset preparation life cycle which would include labels and their correction.

A proof of concept was developed in the UI to highlight educate users about this here: #5410

We should expand this properly so that users can define a LabelView in the online store that can be used to store labels explicitly.

Describe the solution you'd like
A LabelView that can be used to write data to the online and offline store.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered. It could look something like:

customer = Entity(name="customer_id", dtype=Int64)

# 2) Point to your label data in e.g. Parquet
label_source = FileSource(
    path="gs://my-bucket/churn_labels/*.parquet",
    event_timestamp_column="label_timestamp",
    created_timestamp_column="created_ts",
)

# 3) Declare the LabelView
customer_churn = LabelView(
    name="customer_churn",
    entities=[customer],
    schema=[
        Field(name="churned", dtype=ValueType.BOOL),
        Field(name="risk_score", dtype=ValueType.FLOAT),
    ],
    batch_source=label_source,
    ttl=timedelta(days=90),
    description="Customer churn flag and risk score for training/monitoring.",
)

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions