Skip to content

Latest commit

 

History

History
218 lines (162 loc) · 6.09 KB

File metadata and controls

218 lines (162 loc) · 6.09 KB

OpenLineage Integration

This module provides native integration between Feast and OpenLineage, enabling automatic data lineage tracking for ML feature engineering workflows.

Overview

When enabled, the integration automatically emits OpenLineage events for:

  • Registry changes - Events when feature views, feature services, and entities are applied
  • Feature materialization - START, COMPLETE, and FAIL events when features are materialized

No code changes required - just enable OpenLineage in your feature_store.yaml!

Installation

OpenLineage is an optional dependency. Install it with:

pip install openlineage-python

Or install Feast with the OpenLineage extra:

pip install feast[openlineage]

Configuration

Add the openlineage section to your feature_store.yaml:

project: my_project
registry: data/registry.db
provider: local
online_store:
  type: sqlite
  path: data/online_store.db

openlineage:
  enabled: true
  transport_type: http
  transport_url: http://localhost:5000
  transport_endpoint: api/v1/lineage
  namespace: feast
  emit_on_apply: true
  emit_on_materialize: true

Once configured, all Feast operations will automatically emit lineage events.

Environment Variables

You can also configure via environment variables:

export FEAST_OPENLINEAGE_ENABLED=true
export FEAST_OPENLINEAGE_TRANSPORT_TYPE=http
export FEAST_OPENLINEAGE_URL=http://localhost:5000
export FEAST_OPENLINEAGE_ENDPOINT=api/v1/lineage
export FEAST_OPENLINEAGE_NAMESPACE=feast

Usage

Once configured, lineage is tracked automatically:

from feast import FeatureStore
from datetime import datetime, timedelta

# Create FeatureStore - OpenLineage is initialized automatically if configured
fs = FeatureStore(repo_path="feature_repo")

# Apply operations emit lineage events automatically
fs.apply([driver_entity, driver_hourly_stats_view])

# Materialize emits START, COMPLETE/FAIL events automatically
fs.materialize(
    start_date=datetime.now() - timedelta(days=1),
    end_date=datetime.now()
)

Configuration Options

Option Default Description
enabled false Enable/disable OpenLineage integration
transport_type http Transport type: http, file, kafka
transport_url - URL for HTTP transport (required)
transport_endpoint api/v1/lineage API endpoint for HTTP transport
api_key - Optional API key for authentication
namespace feast Namespace for lineage events (uses project name if set to "feast")
producer feast Producer identifier
emit_on_apply true Emit events on feast apply
emit_on_materialize true Emit events on materialization

Lineage Graph Structure

When you run feast apply, Feast creates a lineage graph that matches the Feast UI:

DataSources ──┐
              ├──→ feast_feature_views_{project} ──→ FeatureViews
Entities ─────┘                                          │
                                                         │
                                                         ▼
                                     feature_service_{name} ──→ FeatureService

Jobs created:

  • feast_feature_views_{project}: Shows DataSources + Entities → FeatureViews
  • feature_service_{name}: Shows specific FeatureViews → FeatureService (one per service)

Datasets include:

  • Schema with feature names, types, descriptions, and tags
  • Feast-specific facets with metadata (TTL, entities, owner, etc.)
  • Documentation facets with descriptions

Transport Types

HTTP Transport (Recommended for Production)

openlineage:
  enabled: true
  transport_type: http
  transport_url: http://marquez:5000
  transport_endpoint: api/v1/lineage
  api_key: your-api-key  # Optional

File Transport

openlineage:
  enabled: true
  transport_type: file
  additional_config:
    log_file_path: openlineage_events.json

Kafka Transport

openlineage:
  enabled: true
  transport_type: kafka
  additional_config:
    bootstrap_servers: localhost:9092
    topic: openlineage.events

Custom Feast Facets

The integration includes custom Feast-specific facets in lineage events:

FeastFeatureViewFacet

Captures metadata about feature views:

  • name: Feature view name
  • ttl_seconds: Time-to-live in seconds
  • entities: List of entity names
  • features: List of feature names
  • online_enabled / offline_enabled: Store configuration
  • description: Feature view description
  • tags: Key-value tags

FeastFeatureServiceFacet

Captures metadata about feature services:

  • name: Feature service name
  • feature_views: List of feature view names
  • feature_count: Total number of features
  • description: Feature service description
  • tags: Key-value tags

FeastMaterializationFacet

Captures materialization run metadata:

  • feature_views: Feature views being materialized
  • start_date / end_date: Materialization window
  • rows_written: Number of rows written

Lineage Visualization

Use Marquez to visualize your Feast lineage:

# Start Marquez
docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez

# Configure Feast to emit to Marquez (in feature_store.yaml)
# openlineage:
#   enabled: true
#   transport_type: http
#   transport_url: http://localhost:5000

Then access the Marquez UI at http://localhost:3000 to see your feature lineage.

Namespace Behavior

  • If namespace is set to "feast" (default): Uses project name as namespace (e.g., my_project)
  • If namespace is set to a custom value: Uses {namespace}/{project} (e.g., custom/my_project)

Feast to OpenLineage Mapping

Feast Concept OpenLineage Concept
DataSource InputDataset
FeatureView OutputDataset (of feature views job) / InputDataset (of feature service job)
Feature Schema field
Entity InputDataset
FeatureService OutputDataset
Materialization RunEvent (START/COMPLETE/FAIL)