|
| 1 | +# OpenLineage Integration |
| 2 | + |
| 3 | +This module provides **native integration** between Feast and [OpenLineage](https://openlineage.io/), enabling automatic data lineage tracking for ML feature engineering workflows. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +When enabled, the integration **automatically** emits OpenLineage events for: |
| 8 | + |
| 9 | +- **Registry changes** - Events when feature views, feature services, and entities are applied |
| 10 | +- **Feature materialization** - START, COMPLETE, and FAIL events when features are materialized |
| 11 | + |
| 12 | +**No code changes required** - just enable OpenLineage in your `feature_store.yaml`! |
| 13 | + |
| 14 | +## Installation |
| 15 | + |
| 16 | +OpenLineage is an optional dependency. Install it with: |
| 17 | + |
| 18 | +```bash |
| 19 | +pip install openlineage-python |
| 20 | +``` |
| 21 | + |
| 22 | +Or install Feast with the OpenLineage extra: |
| 23 | + |
| 24 | +```bash |
| 25 | +pip install feast[openlineage] |
| 26 | +``` |
| 27 | + |
| 28 | +## Configuration |
| 29 | + |
| 30 | +Add the `openlineage` section to your `feature_store.yaml`: |
| 31 | + |
| 32 | +```yaml |
| 33 | +project: my_project |
| 34 | +registry: data/registry.db |
| 35 | +provider: local |
| 36 | +online_store: |
| 37 | + type: sqlite |
| 38 | + path: data/online_store.db |
| 39 | + |
| 40 | +openlineage: |
| 41 | + enabled: true |
| 42 | + transport_type: http |
| 43 | + transport_url: http://localhost:5000 |
| 44 | + transport_endpoint: api/v1/lineage |
| 45 | + namespace: feast |
| 46 | + emit_on_apply: true |
| 47 | + emit_on_materialize: true |
| 48 | +``` |
| 49 | +
|
| 50 | +Once configured, all Feast operations will automatically emit lineage events. |
| 51 | +
|
| 52 | +### Environment Variables |
| 53 | +
|
| 54 | +You can also configure via environment variables: |
| 55 | +
|
| 56 | +```bash |
| 57 | +export FEAST_OPENLINEAGE_ENABLED=true |
| 58 | +export FEAST_OPENLINEAGE_TRANSPORT_TYPE=http |
| 59 | +export FEAST_OPENLINEAGE_URL=http://localhost:5000 |
| 60 | +export FEAST_OPENLINEAGE_ENDPOINT=api/v1/lineage |
| 61 | +export FEAST_OPENLINEAGE_NAMESPACE=feast |
| 62 | +``` |
| 63 | + |
| 64 | +## Usage |
| 65 | + |
| 66 | +Once configured, lineage is tracked automatically: |
| 67 | + |
| 68 | +```python |
| 69 | +from feast import FeatureStore |
| 70 | +from datetime import datetime, timedelta |
| 71 | + |
| 72 | +# Create FeatureStore - OpenLineage is initialized automatically if configured |
| 73 | +fs = FeatureStore(repo_path="feature_repo") |
| 74 | + |
| 75 | +# Apply operations emit lineage events automatically |
| 76 | +fs.apply([driver_entity, driver_hourly_stats_view]) |
| 77 | + |
| 78 | +# Materialize emits START, COMPLETE/FAIL events automatically |
| 79 | +fs.materialize( |
| 80 | + start_date=datetime.now() - timedelta(days=1), |
| 81 | + end_date=datetime.now() |
| 82 | +) |
| 83 | + |
| 84 | +``` |
| 85 | + |
| 86 | +## Configuration Options |
| 87 | + |
| 88 | +| Option | Default | Description | |
| 89 | +|--------|---------|-------------| |
| 90 | +| `enabled` | `false` | Enable/disable OpenLineage integration | |
| 91 | +| `transport_type` | `http` | Transport type: `http`, `file`, `kafka` | |
| 92 | +| `transport_url` | - | URL for HTTP transport (required) | |
| 93 | +| `transport_endpoint` | `api/v1/lineage` | API endpoint for HTTP transport | |
| 94 | +| `api_key` | - | Optional API key for authentication | |
| 95 | +| `namespace` | `feast` | Namespace for lineage events (uses project name if set to "feast") | |
| 96 | +| `producer` | `feast` | Producer identifier | |
| 97 | +| `emit_on_apply` | `true` | Emit events on `feast apply` | |
| 98 | +| `emit_on_materialize` | `true` | Emit events on materialization | |
| 99 | + |
| 100 | +## Lineage Graph Structure |
| 101 | + |
| 102 | +When you run `feast apply`, Feast creates a lineage graph that matches the Feast UI: |
| 103 | + |
| 104 | +``` |
| 105 | +DataSources ──┐ |
| 106 | + ├──→ feast_feature_views_{project} ──→ FeatureViews |
| 107 | +Entities ─────┘ │ |
| 108 | + │ |
| 109 | + ▼ |
| 110 | + feature_service_{name} ──→ FeatureService |
| 111 | +``` |
| 112 | + |
| 113 | +**Jobs created:** |
| 114 | +- `feast_feature_views_{project}`: Shows DataSources + Entities → FeatureViews |
| 115 | +- `feature_service_{name}`: Shows specific FeatureViews → FeatureService (one per service) |
| 116 | + |
| 117 | +**Datasets include:** |
| 118 | +- Schema with feature names, types, descriptions, and tags |
| 119 | +- Feast-specific facets with metadata (TTL, entities, owner, etc.) |
| 120 | +- Documentation facets with descriptions |
| 121 | + |
| 122 | +## Transport Types |
| 123 | + |
| 124 | +### HTTP Transport (Recommended for Production) |
| 125 | + |
| 126 | +```yaml |
| 127 | +openlineage: |
| 128 | + enabled: true |
| 129 | + transport_type: http |
| 130 | + transport_url: http://marquez:5000 |
| 131 | + transport_endpoint: api/v1/lineage |
| 132 | + api_key: your-api-key # Optional |
| 133 | +``` |
| 134 | +
|
| 135 | +### File Transport |
| 136 | +
|
| 137 | +```yaml |
| 138 | +openlineage: |
| 139 | + enabled: true |
| 140 | + transport_type: file |
| 141 | + additional_config: |
| 142 | + log_file_path: openlineage_events.json |
| 143 | +``` |
| 144 | +
|
| 145 | +### Kafka Transport |
| 146 | +
|
| 147 | +```yaml |
| 148 | +openlineage: |
| 149 | + enabled: true |
| 150 | + transport_type: kafka |
| 151 | + additional_config: |
| 152 | + bootstrap_servers: localhost:9092 |
| 153 | + topic: openlineage.events |
| 154 | +``` |
| 155 | +
|
| 156 | +## Custom Feast Facets |
| 157 | +
|
| 158 | +The integration includes custom Feast-specific facets in lineage events: |
| 159 | +
|
| 160 | +### FeastFeatureViewFacet |
| 161 | +
|
| 162 | +Captures metadata about feature views: |
| 163 | +- `name`: Feature view name |
| 164 | +- `ttl_seconds`: Time-to-live in seconds |
| 165 | +- `entities`: List of entity names |
| 166 | +- `features`: List of feature names |
| 167 | +- `online_enabled` / `offline_enabled`: Store configuration |
| 168 | +- `description`: Feature view description |
| 169 | +- `tags`: Key-value tags |
| 170 | + |
| 171 | +### FeastFeatureServiceFacet |
| 172 | + |
| 173 | +Captures metadata about feature services: |
| 174 | +- `name`: Feature service name |
| 175 | +- `feature_views`: List of feature view names |
| 176 | +- `feature_count`: Total number of features |
| 177 | +- `description`: Feature service description |
| 178 | +- `tags`: Key-value tags |
| 179 | + |
| 180 | +### FeastMaterializationFacet |
| 181 | + |
| 182 | +Captures materialization run metadata: |
| 183 | +- `feature_views`: Feature views being materialized |
| 184 | +- `start_date` / `end_date`: Materialization window |
| 185 | +- `rows_written`: Number of rows written |
| 186 | + |
| 187 | +## Lineage Visualization |
| 188 | + |
| 189 | +Use [Marquez](https://marquezproject.ai/) to visualize your Feast lineage: |
| 190 | + |
| 191 | +```bash |
| 192 | +# Start Marquez |
| 193 | +docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez |
| 194 | +
|
| 195 | +# Configure Feast to emit to Marquez (in feature_store.yaml) |
| 196 | +# openlineage: |
| 197 | +# enabled: true |
| 198 | +# transport_type: http |
| 199 | +# transport_url: http://localhost:5000 |
| 200 | +``` |
| 201 | + |
| 202 | +Then access the Marquez UI at http://localhost:3000 to see your feature lineage. |
| 203 | + |
| 204 | +## Namespace Behavior |
| 205 | + |
| 206 | +- If `namespace` is set to `"feast"` (default): Uses project name as namespace (e.g., `my_project`) |
| 207 | +- If `namespace` is set to a custom value: Uses `{namespace}/{project}` (e.g., `custom/my_project`) |
| 208 | + |
| 209 | +## Feast to OpenLineage Mapping |
| 210 | + |
| 211 | +| Feast Concept | OpenLineage Concept | |
| 212 | +|---------------|---------------------| |
| 213 | +| DataSource | InputDataset | |
| 214 | +| FeatureView | OutputDataset (of feature views job) / InputDataset (of feature service job) | |
| 215 | +| Feature | Schema field | |
| 216 | +| Entity | InputDataset | |
| 217 | +| FeatureService | OutputDataset | |
| 218 | +| Materialization | RunEvent (START/COMPLETE/FAIL) | |
0 commit comments