Feast provides native integration with MLflow for automatic feature lineage tracking alongside ML experiments. When enabled, every feature retrieval is logged to the active MLflow run.
- Which features did this model use? -- auto-logged on every
get_historical_features()/get_online_features()call - Which feature service should I use to serve this model? -- resolved from model URI via
store.mlflow.resolve_features() - Can I reproduce the exact training data? -- entity DataFrame saved as an MLflow artifact
- Which models break if I change a feature view? -- reverse index via the Feast UI
/api/mlflow-feature-usageendpoint - When was the feature store last updated? --
feast applyandfeast materializelogged to a separate ops experiment
| Capability | How |
|---|---|
| Auto-log feature metadata | Tags on every retrieval inside an active MLflow run |
| Entity DataFrame archival | entity_df.parquet artifact for full reproducibility |
| Model registration with lineage | feast.feature_service tag propagated to model versions |
| Training-to-prediction linkage | store.mlflow.load_model() links prediction runs back to training runs |
| Model-to-feature resolution | Map any model URI back to its Feast feature service |
| Operation audit trail | feast apply / feast materialize logged to {project}-feast-ops |
store.mlflow API |
Single entry point — zero import mlflow, zero client objects |
| Feast UI integration | Per-feature-view usage stats and registered model associations |
MLflow is an optional dependency:
pip install feast[mlflow]Add the mlflow section to your feature_store.yaml:
project: my_project
registry: data/registry.db
provider: local
online_store:
type: sqlite
path: data/online_store.db
mlflow:
enabled: true
tracking_uri: http://127.0.0.1:5000 # optional, falls back to MLFLOW_TRACKING_URI env var
auto_log: true # default
auto_log_entity_df: false # default
entity_df_max_rows: 100000 # default
log_operations: false # default
ops_experiment_suffix: "-feast-ops" # default| Option | Type | Default | Description |
|---|---|---|---|
enabled |
bool | false |
Master switch for the entire integration |
tracking_uri |
string | (none) | MLflow tracking server URI. Falls back to MLFLOW_TRACKING_URI env var, then MLflow default (./mlruns) |
auto_log |
bool | true |
Automatically log feature metadata on every retrieval when an active MLflow run exists |
auto_log_entity_df |
bool | false |
Save the entity DataFrame as entity_df.parquet artifact on historical retrieval |
entity_df_max_rows |
int | 100000 |
Skip entity DataFrame artifact upload for DataFrames exceeding this limit |
log_operations |
bool | false |
Log feast apply and feast materialize to a separate MLflow experiment |
ops_experiment_suffix |
string | "-feast-ops" |
Suffix appended to project name for the operations experiment |
The tracking URI is resolved in this order:
tracking_urifield infeature_store.yamlMLFLOW_TRACKING_URIenvironment variable- MLflow's default (
./mlrunslocal directory)
This means you can omit tracking_uri from the YAML and set MLFLOW_TRACKING_URI in your environment instead, or it would be pulled from ./mlruns automatically when both are not set.
When auto_log: true and an active MLflow run exists, each get_historical_features() or get_online_features() call records:
| Tag | Example | Description |
|---|---|---|
feast.project |
my_project |
Feast project name |
feast.retrieval_type |
historical / online |
Type of feature retrieval |
feast.feature_service |
driver_activity_v1 |
Auto-resolved feature service name (if matched) |
feast.feature_views |
driver_hourly_stats |
Comma-separated feature view names |
feast.feature_refs |
driver_hourly_stats:conv_rate,... |
All feature references |
feast.entity_count |
200 |
Number of entities in the request |
feast.feature_count |
5 |
Number of features retrieved |
| Metric | Example | Description |
|---|---|---|
feast.job_submission_sec |
0.4321 |
Feature retrieval duration in seconds |
When auto_log_entity_df: true and the entity DataFrame has fewer than entity_df_max_rows rows:
| Artifact | Description |
|---|---|
entity_df.parquet |
Full entity DataFrame used in the retrieval |
When a model is logged via store.mlflow.log_model():
| Artifact | Description |
|---|---|
feast_features.json |
JSON list of feature references the model was trained on |
Regardless of auto_log_entity_df, the following metadata is logged when present:
| Tag / Param | When | Description |
|---|---|---|
feast.entity_df_type |
Always | dataframe, sql, or range |
feast.entity_df_rows |
DataFrame input | Row count |
feast.entity_df_columns |
DataFrame input | Column names |
feast.entity_df_query |
SQL input | The SQL query string |
feast.start_date / feast.end_date |
Range-based input | Date range |
When log_operations: true, feast apply and feast materialize create self-contained runs in the {project}{ops_experiment_suffix} experiment (default: my_project-feast-ops):
Apply runs:
| Tag / Metric | Example |
|---|---|
feast.operation |
apply |
feast.project |
my_project |
feast.feature_views_changed |
driver_hourly_stats,order_stats |
feast.feature_services_changed |
driver_activity_v1 |
feast.entities_changed |
driver,restaurant |
feast.apply.feature_views_count |
2 |
feast.apply.feature_services_count |
1 |
feast.apply.entities_count |
2 |
Materialize runs:
| Tag / Metric | Example |
|---|---|
feast.operation |
materialize / materialize_incremental |
feast.project |
my_project |
feast.materialize.feature_views |
driver_hourly_stats |
feast.materialize.start_date |
2024-01-01T00:00:00 |
feast.materialize.end_date |
2024-01-02T00:00:00 |
feast.materialize.duration_sec |
12.3456 |
With the configuration above, feature metadata is logged automatically whenever there is an active MLflow run. No explicit import mlflow is needed — just use store.mlflow:
from feast import FeatureStore
store = FeatureStore(".")
with store.mlflow.start_run(run_name="my_training"):
training_df = store.get_historical_features(
features=store.get_feature_service("driver_activity_v1"),
entity_df=entity_df,
).to_df()
# The run is now tagged with feast.feature_refs, feast.feature_views, etc.
model = train(training_df)
store.mlflow.log_model(model, "model")No extra code needed — the tags are written automatically.
store.mlflow is the primary way to interact with the Feast–MLflow integration. It provides Feast-enhanced versions of common MLflow operations, and delegates everything else to the raw mlflow module:
from feast import FeatureStore
from sklearn.linear_model import LogisticRegression
store = FeatureStore(".")
# Training
with store.mlflow.start_run(run_name="v1_training"):
df = store.get_historical_features(
features=store.get_feature_service("driver_activity_v1"),
entity_df=entity_df,
).to_df()
model = LogisticRegression().fit(X, y)
store.mlflow.log_model(model, "model") # Feast-enhanced: saves feast_features.json
train_run_id = store.mlflow.active_run_id
# Register model (auto-tags version with feast.feature_service)
store.mlflow.register_model(f"runs:/{train_run_id}/model", "driver_model")
# Prediction (auto-links to training run)
with store.mlflow.start_run(run_name="prediction"):
model = store.mlflow.load_model("models:/driver_model/1")
online_features = store.get_online_features(
features=store.get_feature_service("driver_activity_v1"),
entity_rows=[{"driver_id": 1001}],
)
predictions = model.predict(...)For users who prefer a module-level import, feast.mlflow is a drop-in replacement for import mlflow that delegates to the same store.mlflow client under the hood:
import feast.mlflow
from feast import FeatureStore
store = FeatureStore(".") # auto-registers with feast.mlflow
with feast.mlflow.start_run(run_name="training"):
df = store.get_historical_features(...).to_df()
feast.mlflow.log_params({"lr": "0.01"}) # plain passthrough
feast.mlflow.log_metrics({"f1": 0.85}) # plain passthrough
feast.mlflow.log_model(model, "model") # Feast-enhancedfeast.mlflow resolves its FeatureStore in this order:
- Explicit
feast.mlflow.init(store)— if called, overrides everything - Auto-registered — the most recently created
FeatureStorewithmlflow.enabled=trueregisters itself automatically - Auto-discovery — falls back to
FeatureStore(".")from the current directory
In most cases, simply creating a FeatureStore(...) is enough — no init() needed.
feast.mlflow raises clear errors on first use if something is misconfigured:
| Condition | Error |
|---|---|
No feature_store.yaml in cwd and no store created |
RuntimeError with guidance to call feast.mlflow.init(store) |
mlflow.enabled is not set to true |
RuntimeError with guidance to set mlflow.enabled=true |
mlflow pip package not installed |
ImportError with guidance to run pip install feast[mlflow] |
When mlflow.enabled is false (or omitted), store.mlflow returns None, allowing callers to guard with if store.mlflow:. The feast.mlflow module raises RuntimeError only when you attempt to use it without an enabled store.
These functions add automatic Feast tagging and lineage on top of their MLflow counterparts:
| Function | Enhancement |
|---|---|
store.mlflow.start_run(run_name, tags) |
Auto-tags run with feast.project |
store.mlflow.log_model(model, path, flavor) |
Auto-attaches feast_features.json artifact |
store.mlflow.register_model(model_uri, name) |
Auto-tags model version with feast.feature_service |
store.mlflow.load_model(model_uri) |
Auto-tags prediction run with training lineage |
Supported model flavors for log_model(): sklearn, pytorch, xgboost, lightgbm, tensorflow, keras, pyfunc.
These are unique to the Feast integration and have no mlflow equivalent:
| Function | Description |
|---|---|
store.mlflow.resolve_features(model_uri) |
Resolve model URI to Feast feature service name |
store.mlflow.get_training_entity_df(run_id, ...) |
Recover entity DataFrame from a past MLflow run |
store.mlflow.log_training_dataset(df, dataset_name) |
Log a training DataFrame as an MLflow dataset input |
store.mlflow.active_run_id |
Current active MLflow run ID (or None) |
store.mlflow.client |
The underlying MlflowClient instance for advanced queries |
feast.mlflow.init(store) |
Explicitly bind feast.mlflow module to a FeatureStore (optional) |
The feast.mlflow module delegates any attribute not listed above to the raw mlflow module. This means you can use feast.mlflow as a drop-in replacement for import mlflow:
feast.mlflow.log_params(params) # passes through to mlflow.log_params
feast.mlflow.log_metrics(metrics)
feast.mlflow.set_tag("env", "staging")
feast.mlflow.MlflowClient()store.mlflow does not have this passthrough — it only exposes the Feast-enhanced and Feast-only methods listed above. To access raw mlflow functions from store.mlflow, use the escape hatches:
store.mlflow.client.log_param(run_id, "lr", "0.01") # via MlflowClient instance
store.mlflow.mlflow.log_params(params) # via raw mlflow modulefrom feast import FeatureStore
store = FeatureStore(".")
fs_name = store.mlflow.resolve_features("models:/driver_model/1")
# Returns: "driver_activity_v1"Resolution order:
- Model version tag
feast.feature_service(set byregister_model()) - Training run tag
feast.feature_service(set by auto-logging)
from feast import FeatureStore
store = FeatureStore(".")
entity_df = store.mlflow.get_training_entity_df(run_id="abc123")
with store.mlflow.start_run(run_name="retrain_v2"):
new_df = store.get_historical_features(
features=store.get_feature_service("driver_activity_v1"),
entity_df=entity_df,
).to_df()
model = train(new_df)
store.mlflow.log_model(model, "model")This requires auto_log_entity_df: true to have been enabled when the original run was recorded.
The Feast UI server exposes three API endpoints that aggregate data from MLflow:
| Endpoint | Description |
|---|---|
/api/mlflow-runs |
All Feast-tagged MLflow runs with linked registered models |
/api/mlflow-feature-usage |
Per-feature-view usage stats (run count, last used, associated models) |
/api/mlflow-feature-models |
Reverse index of feature refs to registered models |
The feature view detail page in the Feast UI displays:
- MLflow Training Runs count and Last Used date in the header stats
- An MLflow Usage panel showing training run count, relative last-used time, and a table of registered models that depend on the feature view
Start the Feast UI with:
feast ui --host 127.0.0.1 --port 8888