Skip to content

feat: Native Unity Catalog integration for offline store and feature table registration #6499

@ntkathole

Description

@ntkathole

Problem

Teams running Unity Catalog can read UC-governed Delta tables today via the Spark offline store and SparkSource (e.g. catalog.schema.table), but there is no first-class integration for:

  1. UC feature table registration — Feast feast apply does not register feature views as UC feature tables (primary keys, Features UI, Catalog Explorer discovery).
  2. UC-backed materializationfeast materialize writes to the online store but does not persist governed Delta tables back to UC or sync UC lineage.
  3. Databricks-aware configuration — Users must hand-wire Spark session / cluster config instead of a dedicated offline store type with UC defaults.

UC tables are Spark-accessible Delta tables, so the read path works generically. The gap is the UC governance/registration layer on top (feature table metadata, primary keys, discovery, lineage) that Unity Catalog provides natively.

Motivation / use cases

  • Platform teams want one feature definition in Feast (entities, TTLs, PIT joins, serving API) while keeping UC as the governance catalog for discovery and access control.
  • Data scientists expect feature tables to appear in Catalog Explorer / Features UI, not only in the Feast registry.
  • ML engineers need lineage from source UC tables → feature tables → models without maintaining parallel metadata.
  • Teams evaluating Feast vs Databricks Feature Store need a clear path when UC registration is a hard requirement.

Current state

Capability Supported today?
Read UC Delta tables via Spark ✅ via Spark offline store + SparkSource
Point-in-time training joins
UC feature table registration on feast apply
Materialize to UC Delta + online store ❌ (offline_write_batch not supported for Spark offline store)
Databricks-specific offline store config
Import existing UC feature tables as Feast views

Related issues: #2406 (Delta/Iceberg/Hudi table formats), #764 (Databricks Spark runner, closed).

Proposed solution (phased)

L1 — Databricks-aware Spark offline store (read path)

  • New contrib type e.g. databricks_uc extending Spark offline store
  • feature_store.yaml config for workspace host, default catalog/schema, auth
  • UnityCatalogSource (or extended SparkSource) with UC path validation

L2 — UC registration on feast apply

  • Hook after registry apply: register/update UC feature tables via FeatureEngineeringClient (or UC REST APIs)
  • Map Entity.join_keys → UC primary key constraints
  • Sync Feast tags, description, owner to UC table properties
  • Opt-out per feature view: uc_config.register_as_feature_table: false

L3 — UC-backed materialization

  • Extend materialization to MERGE/append into catalog.schema.<feature_view> Delta tables
  • Continue writing to configured online store for low-latency serving
  • Emit OpenLineage events with UC FQNs

L4 (optional) — Bidirectional sync

  • feast import-uc-table to scaffold FeatureView from existing UC feature tables
  • UC lineage bridge for apply/materialize events

Example API sketch

# feature_store.yaml
offline_store:
  type: databricks_uc
  workspace_host: https://adb-xxx.azuredatabricks.net
  catalog: prod_ml
  schema: features
  uc_registration:
    enabled: true
    on_apply: true
customer_features = FeatureView(
    name="customer_features",
    entities=[customer],
    source=UnityCatalogSource(
        table="prod_raw.bronze.transactions",
        timestamp_field="event_timestamp",
    ),
    schema=[...],
    uc_config={
        "catalog": "prod_ml",
        "schema": "features",
        "table": "customer_features",
        "register_as_feature_table": True,
        "materialize_offline": True,
    },
)

Expected behavior

feast apply
  → Feast registry updated (existing)
  → UC: create/register feature table, set primary key, sync metadata

feast materialize-incremental <end>
  → Read from UC sources via Spark
  → Write Delta to UC feature table (new)
  → Write to online store (existing)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions