-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Is your feature request related to a problem? Please describe.
Historically, Feast has played a key role in feature development. Particularly around dataset preparation for model development and feature serving for online inference.
Pictorially, you can think of it like this:
Yet labels are the core piece of a training dataset that makes model training successful. Without labels, features are a waste of time (excluding semi/self-supervised learning).
Given the work with compute engine, my proposal is to expand Feast to include the entire Training Dataset preparation life cycle which would include labels and their correction.
A proof of concept was developed in the UI to highlight educate users about this here: #5410
We should expand this properly so that users can define a LabelView in the online store that can be used to store labels explicitly.
Describe the solution you'd like
A LabelView that can be used to write data to the online and offline store.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered. It could look something like:
customer = Entity(name="customer_id", dtype=Int64)
# 2) Point to your label data in e.g. Parquet
label_source = FileSource(
path="gs://my-bucket/churn_labels/*.parquet",
event_timestamp_column="label_timestamp",
created_timestamp_column="created_ts",
)
# 3) Declare the LabelView
customer_churn = LabelView(
name="customer_churn",
entities=[customer],
schema=[
Field(name="churned", dtype=ValueType.BOOL),
Field(name="risk_score", dtype=ValueType.FLOAT),
],
batch_source=label_source,
ttl=timedelta(days=90),
description="Customer churn flag and risk score for training/monitoring.",
)Additional context
Add any other context or screenshots about the feature request here.
