Skip to content

Commit bc4ea5e

Browse files
committed
feat: Added improved labelView UI version 2
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
1 parent d671617 commit bc4ea5e

33 files changed

Lines changed: 2872 additions & 1512 deletions

docs/SUMMARY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
* [Data ingestion](getting-started/concepts/data-ingestion.md)
2525
* [Entity](getting-started/concepts/entity.md)
2626
* [Feature view](getting-started/concepts/feature-view.md)
27-
* [\[Alpha\] Label view](getting-started/concepts/label-view.md)
27+
* [Label view](getting-started/concepts/label-view.md)
2828
* [Feature retrieval](getting-started/concepts/feature-retrieval.md)
2929
* [Point-in-time joins](getting-started/concepts/point-in-time-joins.md)
3030
* [\[Alpha\] Saved dataset](getting-started/concepts/dataset.md)

docs/adr/ADR-0012-label-view.md

Lines changed: 101 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Today, Feast treats all data as immutable, append-only feature data. This works
3636
| **LabelView** | A new first-class Feast primitive (subclass of `BaseFeatureView`) that manages mutable labels keyed by entities. Stored in its own registry table/proto section. |
3737
| **ConflictPolicy** | An enum (`LAST_WRITE_WINS`, `LABELER_PRIORITY`, `MAJORITY_VOTE`) controlling how conflicting labels from different labelers are resolved. Enforced for offline store reads; online store uses LAST_WRITE_WINS. |
3838
| **labeler_field** | A designated schema field (default: `"labeler"`) that identifies which source wrote each label. Enables multi-labeler provenance tracking. |
39-
| **retain_history** | A boolean flag indicating whether full write history should be kept per entity key. Inherent to the offline store (all writes are appended); online store keeps latest only. |
39+
4040
| **reference_feature_view** | Optional link to the `FeatureView` whose entities this label view annotates, for documentation and lineage. |
4141
| **PushSource integration** | Labels are ingested via `FeatureStore.push()` through a `PushSource`, writing to both online and offline stores in real time. |
4242

@@ -66,7 +66,7 @@ BaseFeatureView (abstract)
6666
└── LabelView ← new
6767
```
6868

69-
LabelView inherits from `BaseFeatureView`, gaining the standard name, features, projection, `proto_class`, versioning (`version`, `current_version_number`), and schema infrastructure. It adds label-specific fields: `labeler_field`, `conflict_policy`, `retain_history`, and `reference_feature_view`.
69+
LabelView inherits from `BaseFeatureView`, gaining the standard name, features, projection, `proto_class`, versioning (`version`, `current_version_number`), and schema infrastructure. It adds label-specific fields: `labeler_field`, `conflict_policy`, `reference_feature_view`, and annotation profile metadata (via `tags`).
7070

7171
## Protobuf Schema
7272

@@ -98,7 +98,7 @@ message LabelViewSpec {
9898
repeated FeatureSpecV2 entity_columns = 11;
9999
string labeler_field = 12;
100100
ConflictResolutionPolicy conflict_policy = 13;
101-
bool retain_history = 14;
101+
reserved 14; // was retain_history (removed — offline store always retains history)
102102
string reference_feature_view = 15;
103103
}
104104
@@ -236,7 +236,6 @@ interaction_labels = LabelView(
236236
source=label_source,
237237
labeler_field="labeler",
238238
conflict_policy=ConflictPolicy.LAST_WRITE_WINS,
239-
retain_history=True,
240239
reference_feature_view="interaction_history",
241240
)
242241

@@ -329,11 +328,105 @@ label_write_permission = Permission(
329328

330329
---
331330

331+
# Annotation Profiles
332+
333+
LabelView supports **annotation profiles** — metadata that tells the Feast UI *how* labels should be created and edited. Profiles are declared in the existing `tags` dict using the `feast.io/` namespace, requiring no schema or proto changes.
334+
335+
## Design Rationale
336+
337+
Different labeling tasks require fundamentally different UX:
338+
339+
| Task | Interaction | Example |
340+
|------|-------------|---------|
341+
| RAG retrieval evaluation | Highlight text spans in a document | Mark chunk relevance for retrieval QA |
342+
| RLHF reward labeling | Fill structured form per entity | Rate response quality, flag safety issues |
343+
| Bulk correction | Edit cells in a table | Fix automated labeler mistakes |
344+
| Active learning | Label model-selected high-value items | Annotate uncertain predictions first |
345+
346+
Rather than building a single generic table, the UI reads annotation metadata from tags and selects the appropriate annotation component.
347+
348+
## Tag Schema
349+
350+
```
351+
feast.io/labeling-method → labeling method (table | entity-form | document-span | active-learning)
352+
feast.io/field-role:<field_name> → semantic role (label | metadata | content | content_ref | span_start | span_end)
353+
feast.io/label-values:<field_name> → comma-separated allowed values
354+
feast.io/label-widget:<field_name> → widget type (enum | binary | text | number)
355+
```
356+
357+
These are parsed by `LabelView.annotation_config` (Python property) and served via the `/annotation-config/{name}` REST endpoint. The UI calls this endpoint to configure the Annotate tab dynamically.
358+
359+
## Profile Behavior
360+
361+
| Profile | Default Method | Additional Methods | Active Learning |
362+
|---------|---------------|--------------------|-----------------|
363+
| `document-span` | Document Span | Review & Edit | Hidden (no entity pool) |
364+
| `entity-form` | Entity Form | Review & Edit, Active Learning | Available |
365+
| `active-learning` | Active Learning | Entity Form, Review & Edit | Primary |
366+
| `table` (default) | Review & Edit | Active Learning, Entity Form | Available |
367+
368+
## Field Roles
369+
370+
Field roles tell the annotation UI which schema fields are labels vs. structural metadata:
371+
372+
- **`label`** — a field the annotator actively fills in. Gets appropriate widget (enum dropdown, binary toggle, text input).
373+
- **`metadata`** — contextual info displayed but not the primary annotation target.
374+
- **`content`** / **`content_ref`** — the text content or document reference for span labeling.
375+
- **`span_start`** / **`span_end`** — byte offsets for text span annotations.
376+
377+
## Example Configurations
378+
379+
### Entity Form (RLHF / Safety Review)
380+
381+
```python
382+
tags={
383+
"feast.io/labeling-method": "entity-form",
384+
"feast.io/field-role:response_quality": "label",
385+
"feast.io/field-role:is_safe": "label",
386+
"feast.io/field-role:reviewer_notes": "metadata",
387+
"feast.io/label-values:response_quality": "excellent,good,acceptable,poor,harmful",
388+
"feast.io/label-values:is_safe": "1,0",
389+
"feast.io/label-widget:response_quality": "enum",
390+
"feast.io/label-widget:is_safe": "binary",
391+
"feast.io/label-widget:reviewer_notes": "text",
392+
}
393+
```
394+
395+
### Document Span (RAG Retrieval Evaluation)
396+
397+
```python
398+
tags={
399+
"feast.io/labeling-method": "document-span",
400+
"feast.io/field-role:source_document": "content_ref",
401+
"feast.io/field-role:chunk_text": "content",
402+
"feast.io/field-role:chunk_start": "span_start",
403+
"feast.io/field-role:chunk_end": "span_end",
404+
"feast.io/field-role:relevance": "label",
405+
"feast.io/field-role:ground_truth": "label",
406+
"feast.io/label-values:relevance": "relevant,irrelevant",
407+
"feast.io/label-widget:relevance": "binary",
408+
"feast.io/label-widget:ground_truth": "text",
409+
}
410+
```
411+
412+
### Table (Bulk Review / Correction)
413+
414+
```python
415+
tags={
416+
"feast.io/labeling-method": "table",
417+
"feast.io/field-role:is_default": "label",
418+
"feast.io/label-values:is_default": "1,0",
419+
"feast.io/label-widget:is_default": "binary",
420+
}
421+
```
422+
423+
---
424+
332425
# Why a Separate Primitive Instead of Extending FeatureView?
333426

334427
A natural question is: **why introduce a new type rather than adding optional label fields to `FeatureView`?**
335428

336-
Structurally, a LabelView today is a schema + entities + PushSource — similar to a `FeatureView` backed by a `PushSource`. The runtime code paths (push, online read, historical join) are identical. One could argue that `labeler_field`, `conflict_policy`, and `retain_history` could be optional fields on `FeatureView` instead of a new type.
429+
Structurally, a LabelView today is a schema + entities + PushSource — similar to a `FeatureView` backed by a `PushSource`. The runtime code paths (push, online read, historical join) are identical. One could argue that `labeler_field` and `conflict_policy` could be optional fields on `FeatureView` instead of a new type.
337430

338431
We chose a separate primitive for the following reasons:
339432

@@ -355,12 +448,12 @@ The design follows the principle that **it is easier to merge two types later th
355448

356449
---
357450

358-
# Alpha Limitations & Future Work
451+
# Limitations & Future Work
359452

360453
| Limitation | Current Behavior | Future Direction |
361454
|---|---|---|
362455
| Conflict policy enforcement | `conflict_policy` is enforced for **offline store reads** (training data, UI, batch pipelines). Online store uses LAST_WRITE_WINS. | Optional online store enforcement for SQL-capable backends. |
363-
| History retention | `retain_history` is inherent to the offline store — all writes are appended. Online store keeps only the latest value per entity. | Optional online store multi-row retention for SQL-capable backends. |
456+
| History retention | The offline store always retains full write history (all writes are appended). Online store keeps only the latest value per entity. | Optional online store multi-row retention for SQL-capable backends. |
364457
| Labeler priority configuration | `LABELER_PRIORITY` accepts a `labeler_priorities` list via the conflict resolver. Not yet persisted in proto. | Add a `labeler_priorities` field to `LabelViewSpec`. |
365458
| Batch materialization | `batch_source` is implemented. LabelViews backed by a direct `DataSource` support `feast materialize`. LabelViews with only a `PushSource` (no `batch_source`) remain push-only. | N/A — supported in this release. |
366459
| Cross-version label joins | No special handling for joining labels across versions in historical retrieval. | Version-aware label joins for reproducible training. |
@@ -372,7 +465,7 @@ The design follows the principle that **it is easier to merge two types later th
372465

373466
1. **Should conflict policy enforcement extend to the online store?** Currently enforced only for offline reads (training-first design). SQL-capable online stores could implement MAJORITY_VOTE natively; Redis would need application-level resolution. Most labeling use cases only need offline enforcement.
374467

375-
2. **Should retain_history have a configurable retention window?** The offline store currently keeps unbounded history. A `max_history_entries` or `history_ttl` config could bound storage while preserving auditability.
468+
2. **Should history have a configurable retention window?** The offline store currently keeps unbounded history. A `max_history_entries` or `history_ttl` config could bound storage while preserving auditability.
376469

377470
3. **Should FeatureService distinguish features from labels?** Today, FeatureService treats LabelViews and FeatureViews uniformly. A future enhancement could tag which projections are "labels" for downstream frameworks that need this distinction (e.g., auto-splitting X/y in training).
378471

0 commit comments

Comments
 (0)