Skip to content

Commit df70d8d

Browse files
feat: Added support for OpenLineage integration (#5884)
* feat: Added support for OpenLineage integration Signed-off-by: ntkathole <nikhilkathole2683@gmail.com> * feat: Added openlineage in requirements Signed-off-by: ntkathole <nikhilkathole2683@gmail.com> * feat: Keep event type as complete instead of other Signed-off-by: ntkathole <nikhilkathole2683@gmail.com> * feat: Added blog post for OpenLineage integration Signed-off-by: ntkathole <nikhilkathole2683@gmail.com> * Update docs/reference/openlineage.md Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com> --------- Signed-off-by: ntkathole <nikhilkathole2683@gmail.com> Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
1 parent 52458fc commit df70d8d

33 files changed

+3868
-522
lines changed

docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,7 @@
163163
* [\[Alpha\] Vector Database](reference/alpha-vector-database.md)
164164
* [\[Alpha\] Data quality monitoring](reference/dqm.md)
165165
* [\[Alpha\] Streaming feature computation with Denormalized](reference/denormalized.md)
166+
* [OpenLineage Integration](reference/openlineage.md)
166167
* [Feast CLI reference](reference/feast-cli-commands.md)
167168
* [Python API reference](http://rtd.feast.dev)
168169
* [Usage](reference/usage.md)

docs/reference/openlineage.md

Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
# OpenLineage Integration
2+
3+
This module provides **native integration** between Feast and [OpenLineage](https://openlineage.io/), enabling automatic data lineage tracking for ML feature engineering workflows.
4+
5+
## Overview
6+
7+
When enabled, the integration **automatically** emits OpenLineage events for:
8+
9+
- **Registry changes** - Events when feature views, feature services, and entities are applied
10+
- **Feature materialization** - START, COMPLETE, and FAIL events when features are materialized
11+
12+
**No code changes required** - just enable OpenLineage in your `feature_store.yaml`!
13+
14+
## Installation
15+
16+
OpenLineage is an optional dependency. Install it with:
17+
18+
```bash
19+
pip install openlineage-python
20+
```
21+
22+
Or install Feast with the OpenLineage extra:
23+
24+
```bash
25+
pip install feast[openlineage]
26+
```
27+
28+
## Configuration
29+
30+
Add the `openlineage` section to your `feature_store.yaml`:
31+
32+
```yaml
33+
project: my_project
34+
registry: data/registry.db
35+
provider: local
36+
online_store:
37+
type: sqlite
38+
path: data/online_store.db
39+
40+
openlineage:
41+
enabled: true
42+
transport_type: http
43+
transport_url: http://localhost:5000
44+
transport_endpoint: api/v1/lineage
45+
namespace: feast
46+
emit_on_apply: true
47+
emit_on_materialize: true
48+
```
49+
50+
Once configured, all Feast operations will automatically emit lineage events.
51+
52+
### Environment Variables
53+
54+
You can also configure via environment variables:
55+
56+
```bash
57+
export FEAST_OPENLINEAGE_ENABLED=true
58+
export FEAST_OPENLINEAGE_TRANSPORT_TYPE=http
59+
export FEAST_OPENLINEAGE_URL=http://localhost:5000
60+
export FEAST_OPENLINEAGE_ENDPOINT=api/v1/lineage
61+
export FEAST_OPENLINEAGE_NAMESPACE=feast
62+
```
63+
64+
## Usage
65+
66+
Once configured, lineage is tracked automatically:
67+
68+
```python
69+
from feast import FeatureStore
70+
from datetime import datetime, timedelta
71+
72+
# Create FeatureStore - OpenLineage is initialized automatically if configured
73+
fs = FeatureStore(repo_path="feature_repo")
74+
75+
# Apply operations emit lineage events automatically
76+
fs.apply([driver_entity, driver_hourly_stats_view])
77+
78+
# Materialize emits START, COMPLETE/FAIL events automatically
79+
fs.materialize(
80+
start_date=datetime.now() - timedelta(days=1),
81+
end_date=datetime.now()
82+
)
83+
84+
```
85+
86+
## Configuration Options
87+
88+
| Option | Default | Description |
89+
|--------|---------|-------------|
90+
| `enabled` | `false` | Enable/disable OpenLineage integration |
91+
| `transport_type` | `http` | Transport type: `http`, `file`, `kafka` |
92+
| `transport_url` | - | URL for HTTP transport (required) |
93+
| `transport_endpoint` | `api/v1/lineage` | API endpoint for HTTP transport |
94+
| `api_key` | - | Optional API key for authentication |
95+
| `namespace` | `feast` | Namespace for lineage events (uses project name if set to "feast") |
96+
| `producer` | `feast` | Producer identifier |
97+
| `emit_on_apply` | `true` | Emit events on `feast apply` |
98+
| `emit_on_materialize` | `true` | Emit events on materialization |
99+
100+
## Lineage Graph Structure
101+
102+
When you run `feast apply`, Feast creates a lineage graph that matches the Feast UI:
103+
104+
```
105+
DataSources ──┐
106+
├──→ feast_feature_views_{project} ──→ FeatureViews
107+
Entities ─────┘ │
108+
109+
110+
feature_service_{name} ──→ FeatureService
111+
```
112+
113+
**Jobs created:**
114+
- `feast_feature_views_{project}`: Shows DataSources + Entities → FeatureViews
115+
- `feature_service_{name}`: Shows specific FeatureViews → FeatureService (one per service)
116+
117+
**Datasets include:**
118+
- Schema with feature names, types, descriptions, and tags
119+
- Feast-specific facets with metadata (TTL, entities, owner, etc.)
120+
- Documentation facets with descriptions
121+
122+
## Transport Types
123+
124+
### HTTP Transport (Recommended for Production)
125+
126+
```yaml
127+
openlineage:
128+
enabled: true
129+
transport_type: http
130+
transport_url: http://marquez:5000
131+
transport_endpoint: api/v1/lineage
132+
api_key: your-api-key # Optional
133+
```
134+
135+
### File Transport
136+
137+
```yaml
138+
openlineage:
139+
enabled: true
140+
transport_type: file
141+
additional_config:
142+
log_file_path: openlineage_events.json
143+
```
144+
145+
### Kafka Transport
146+
147+
```yaml
148+
openlineage:
149+
enabled: true
150+
transport_type: kafka
151+
additional_config:
152+
bootstrap_servers: localhost:9092
153+
topic: openlineage.events
154+
```
155+
156+
## Custom Feast Facets
157+
158+
The integration includes custom Feast-specific facets in lineage events:
159+
160+
### FeastFeatureViewFacet
161+
162+
Captures metadata about feature views:
163+
- `name`: Feature view name
164+
- `ttl_seconds`: Time-to-live in seconds
165+
- `entities`: List of entity names
166+
- `features`: List of feature names
167+
- `online_enabled` / `offline_enabled`: Store configuration
168+
- `description`: Feature view description
169+
- `tags`: Key-value tags
170+
171+
### FeastFeatureServiceFacet
172+
173+
Captures metadata about feature services:
174+
- `name`: Feature service name
175+
- `feature_views`: List of feature view names
176+
- `feature_count`: Total number of features
177+
- `description`: Feature service description
178+
- `tags`: Key-value tags
179+
180+
### FeastMaterializationFacet
181+
182+
Captures materialization run metadata:
183+
- `feature_views`: Feature views being materialized
184+
- `start_date` / `end_date`: Materialization window
185+
- `rows_written`: Number of rows written
186+
187+
## Lineage Visualization
188+
189+
Use [Marquez](https://marquezproject.ai/) to visualize your Feast lineage:
190+
191+
```bash
192+
# Start Marquez
193+
docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez
194+
195+
# Configure Feast to emit to Marquez (in feature_store.yaml)
196+
# openlineage:
197+
# enabled: true
198+
# transport_type: http
199+
# transport_url: http://localhost:5000
200+
```
201+
202+
Then access the Marquez UI at http://localhost:3000 to see your feature lineage.
203+
204+
## Namespace Behavior
205+
206+
- If `namespace` is set to `"feast"` (default): Uses project name as namespace (e.g., `my_project`)
207+
- If `namespace` is set to a custom value: Uses `{namespace}/{project}` (e.g., `custom/my_project`)
208+
209+
## Feast to OpenLineage Mapping
210+
211+
| Feast Concept | OpenLineage Concept |
212+
|---------------|---------------------|
213+
| DataSource | InputDataset |
214+
| FeatureView | OutputDataset (of feature views job) / InputDataset (of feature service job) |
215+
| Feature | Schema field |
216+
| Entity | InputDataset |
217+
| FeatureService | OutputDataset |
218+
| Materialization | RunEvent (START/COMPLETE/FAIL) |
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Feast OpenLineage Integration Example
2+
3+
This example demonstrates Feast's **native OpenLineage integration** for automatic data lineage tracking.
4+
5+
For full documentation, see the [OpenLineage Reference](../../docs/reference/openlineage.md).
6+
7+
## Prerequisites
8+
9+
```bash
10+
pip install feast[openlineage]
11+
```
12+
13+
## Running the Demo
14+
15+
1. Start Marquez:
16+
```bash
17+
docker run -p 5000:5000 -p 3000:3000 marquezproject/marquez
18+
```
19+
20+
2. Run the demo:
21+
```bash
22+
python openlineage_demo.py --url http://localhost:5000
23+
```
24+
25+
3. View lineage at http://localhost:3000
26+
27+
## What the Demo Shows
28+
29+
The demo creates a sample feature repository and demonstrates:
30+
31+
- **Entity**: `driver_id`
32+
- **DataSource**: `driver_stats_source` (Parquet file)
33+
- **FeatureView**: `driver_hourly_stats` with features like conversion rate, acceptance rate
34+
- **FeatureService**: `driver_stats_service` aggregating features
35+
36+
When you run the demo, it will:
37+
1. Create the feature store with OpenLineage enabled
38+
2. Apply the features (emits lineage events)
39+
3. Materialize features (emits START/COMPLETE events)
40+
4. Retrieve features (demonstrates online feature retrieval)
41+
42+
## Lineage Graph
43+
44+
After running the demo, you'll see this lineage in Marquez:
45+
46+
```
47+
driver_stats_source ──┐
48+
├──→ feast_feature_views_openlineage_demo ──→ driver_hourly_stats
49+
driver_id ────────────┘ │
50+
51+
feature_service_driver_stats_service ──→ driver_stats_service
52+
```
53+
54+
## Learn More
55+
56+
- [Feast OpenLineage Reference](../../docs/reference/openlineage.md)
57+
- [OpenLineage Documentation](https://openlineage.io/docs)
58+
- [Marquez Project](https://marquezproject.ai)

0 commit comments

Comments
 (0)