mlflow.md

MLflow Integration

Feast provides native integration with MLflow for automatic feature lineage tracking alongside ML experiments. When enabled, every feature retrieval is logged to the active MLflow run.

Overview

Which features did this model use? -- auto-logged on every get_historical_features() / get_online_features() call
Which feature service should I use to serve this model? -- resolved from model URI via store.mlflow.resolve_features()
Can I reproduce the exact training data? -- entity DataFrame saved as an MLflow artifact
Which models break if I change a feature view? -- reverse index via the Feast UI /api/mlflow-feature-usage endpoint
When was the feature store last updated? -- feast apply and feast materialize logged to a separate ops experiment

Capabilities

Capability	How
Auto-log feature metadata	Tags on every retrieval inside an active MLflow run
Entity DataFrame archival	`entity_df.parquet` artifact for full reproducibility
Model registration with lineage	`feast.feature_service` tag propagated to model versions
Training-to-prediction linkage	`store.mlflow.load_model()` links prediction runs back to training runs
Model-to-feature resolution	Map any model URI back to its Feast feature service
Operation audit trail	`feast apply` / `feast materialize` logged to `{project}-feast-ops`
`store.mlflow` API	Single entry point — zero `import mlflow`, zero client objects
Feast UI integration	Per-feature-view usage stats and registered model associations

Installation

MLflow is an optional dependency:

pip install feast[mlflow]

Configuration

Add the mlflow section to your feature_store.yaml:

project: my_project
registry: data/registry.db
provider: local
online_store:
  type: sqlite
  path: data/online_store.db

mlflow:
  enabled: true
  tracking_uri: http://127.0.0.1:5000   # optional, falls back to MLFLOW_TRACKING_URI env var
  auto_log: true                         # default
  auto_log_entity_df: false              # default
  entity_df_max_rows: 100000             # default
  log_operations: false                  # default
  ops_experiment_suffix: "-feast-ops"    # default

Configuration options

Option	Type	Default	Description
`enabled`	bool	`false`	Master switch for the entire integration
`tracking_uri`	string	(none)	MLflow tracking server URI. Falls back to `MLFLOW_TRACKING_URI` env var, then MLflow default (`./mlruns`)
`auto_log`	bool	`true`	Automatically log feature metadata on every retrieval when an active MLflow run exists
`auto_log_entity_df`	bool	`false`	Save the entity DataFrame as `entity_df.parquet` artifact on historical retrieval
`entity_df_max_rows`	int	`100000`	Skip entity DataFrame artifact upload for DataFrames exceeding this limit
`log_operations`	bool	`false`	Log `feast apply` and `feast materialize` to a separate MLflow experiment
`ops_experiment_suffix`	string	`"-feast-ops"`	Suffix appended to project name for the operations experiment

Tracking URI resolution

The tracking URI is resolved in this order:

tracking_uri field in feature_store.yaml
MLFLOW_TRACKING_URI environment variable
MLflow's default (./mlruns local directory)

This means you can omit tracking_uri from the YAML and set MLFLOW_TRACKING_URI in your environment instead, or it would be pulled from ./mlruns automatically when both are not set.

What gets logged

Tags on retrieval runs

When auto_log: true and an active MLflow run exists, each get_historical_features() or get_online_features() call records:

Tag	Example	Description
`feast.project`	`my_project`	Feast project name
`feast.retrieval_type`	`historical` / `online`	Type of feature retrieval
`feast.feature_service`	`driver_activity_v1`	Auto-resolved feature service name (if matched)
`feast.feature_views`	`driver_hourly_stats`	Comma-separated feature view names
`feast.feature_refs`	`driver_hourly_stats:conv_rate,...`	All feature references
`feast.entity_count`	`200`	Number of entities in the request
`feast.feature_count`	`5`	Number of features retrieved

Metrics

Metric	Example	Description
`feast.job_submission_sec`	`0.4321`	Feature retrieval duration in seconds

Artifacts

When auto_log_entity_df: true and the entity DataFrame has fewer than entity_df_max_rows rows:

Artifact	Description
`entity_df.parquet`	Full entity DataFrame used in the retrieval

When a model is logged via store.mlflow.log_model():

Artifact	Description
`feast_features.json`	JSON list of feature references the model was trained on

Entity DataFrame metadata

Regardless of auto_log_entity_df, the following metadata is logged when present:

Tag / Param	When	Description
`feast.entity_df_type`	Always	`dataframe`, `sql`, or `range`
`feast.entity_df_rows`	DataFrame input	Row count
`feast.entity_df_columns`	DataFrame input	Column names
`feast.entity_df_query`	SQL input	The SQL query string
`feast.start_date` / `feast.end_date`	Range-based input	Date range

Operation logs

When log_operations: true, feast apply and feast materialize create self-contained runs in the {project}{ops_experiment_suffix} experiment (default: my_project-feast-ops):

Apply runs:

Tag / Metric	Example
`feast.operation`	`apply`
`feast.project`	`my_project`
`feast.feature_views_changed`	`driver_hourly_stats,order_stats`
`feast.feature_services_changed`	`driver_activity_v1`
`feast.entities_changed`	`driver,restaurant`
`feast.apply.feature_views_count`	`2`
`feast.apply.feature_services_count`	`1`
`feast.apply.entities_count`	`2`

Materialize runs:

Tag / Metric	Example
`feast.operation`	`materialize` / `materialize_incremental`
`feast.project`	`my_project`
`feast.materialize.feature_views`	`driver_hourly_stats`
`feast.materialize.start_date`	`2024-01-01T00:00:00`
`feast.materialize.end_date`	`2024-01-02T00:00:00`
`feast.materialize.duration_sec`	`12.3456`

Usage

Automatic logging (zero code)

With the configuration above, feature metadata is logged automatically whenever there is an active MLflow run. No explicit import mlflow is needed — just use store.mlflow:

from feast import FeatureStore

store = FeatureStore(".")

with store.mlflow.start_run(run_name="my_training"):
    training_df = store.get_historical_features(
        features=store.get_feature_service("driver_activity_v1"),
        entity_df=entity_df,
    ).to_df()
    # The run is now tagged with feast.feature_refs, feast.feature_views, etc.

    model = train(training_df)
    store.mlflow.log_model(model, "model")

No extra code needed — the tags are written automatically.

`store.mlflow` API (recommended)

store.mlflow is the primary way to interact with the Feast–MLflow integration. It provides Feast-enhanced versions of common MLflow operations, and delegates everything else to the raw mlflow module:

from feast import FeatureStore
from sklearn.linear_model import LogisticRegression

store = FeatureStore(".")

# Training
with store.mlflow.start_run(run_name="v1_training"):
    df = store.get_historical_features(
        features=store.get_feature_service("driver_activity_v1"),
        entity_df=entity_df,
    ).to_df()

    model = LogisticRegression().fit(X, y)
    store.mlflow.log_model(model, "model")     # Feast-enhanced: saves feast_features.json
    train_run_id = store.mlflow.active_run_id

# Register model (auto-tags version with feast.feature_service)
store.mlflow.register_model(f"runs:/{train_run_id}/model", "driver_model")

# Prediction (auto-links to training run)
with store.mlflow.start_run(run_name="prediction"):
    model = store.mlflow.load_model("models:/driver_model/1")
    online_features = store.get_online_features(
        features=store.get_feature_service("driver_activity_v1"),
        entity_rows=[{"driver_id": 1001}],
    )
    predictions = model.predict(...)

`feast.mlflow` module API (alternative)

For users who prefer a module-level import, feast.mlflow is a drop-in replacement for import mlflow that delegates to the same store.mlflow client under the hood:

import feast.mlflow
from feast import FeatureStore

store = FeatureStore(".")   # auto-registers with feast.mlflow

with feast.mlflow.start_run(run_name="training"):
    df = store.get_historical_features(...).to_df()
    feast.mlflow.log_params({"lr": "0.01"})     # plain passthrough
    feast.mlflow.log_metrics({"f1": 0.85})       # plain passthrough
    feast.mlflow.log_model(model, "model")       # Feast-enhanced

Store resolution

feast.mlflow resolves its FeatureStore in this order:

Explicit feast.mlflow.init(store) — if called, overrides everything
Auto-registered — the most recently created FeatureStore with mlflow.enabled=true registers itself automatically
Auto-discovery — falls back to FeatureStore(".") from the current directory

In most cases, simply creating a FeatureStore(...) is enough — no init() needed.

Error handling

feast.mlflow raises clear errors on first use if something is misconfigured:

Condition	Error
No `feature_store.yaml` in cwd and no store created	`RuntimeError` with guidance to call `feast.mlflow.init(store)`
`mlflow.enabled` is not set to `true`	`RuntimeError` with guidance to set `mlflow.enabled=true`
`mlflow` pip package not installed	`ImportError` with guidance to run `pip install feast[mlflow]`

When mlflow.enabled is false (or omitted), store.mlflow returns None, allowing callers to guard with if store.mlflow:. The feast.mlflow module raises RuntimeError only when you attempt to use it without an enabled store.

Feast-enhanced functions

These functions add automatic Feast tagging and lineage on top of their MLflow counterparts:

Function	Enhancement
`store.mlflow.start_run(run_name, tags)`	Auto-tags run with `feast.project`
`store.mlflow.log_model(model, path, flavor)`	Auto-attaches `feast_features.json` artifact
`store.mlflow.register_model(model_uri, name)`	Auto-tags model version with `feast.feature_service`
`store.mlflow.load_model(model_uri)`	Auto-tags prediction run with training lineage

Supported model flavors for log_model(): sklearn, pytorch, xgboost, lightgbm, tensorflow, keras, pyfunc.

Feast-only functions

These are unique to the Feast integration and have no mlflow equivalent:

Function	Description
`store.mlflow.resolve_features(model_uri)`	Resolve model URI to Feast feature service name
`store.mlflow.get_training_entity_df(run_id, ...)`	Recover entity DataFrame from a past MLflow run
`store.mlflow.log_training_dataset(df, dataset_name)`	Log a training DataFrame as an MLflow dataset input
`store.mlflow.active_run_id`	Current active MLflow run ID (or `None`)
`store.mlflow.client`	The underlying `MlflowClient` instance for advanced queries
`feast.mlflow.init(store)`	Explicitly bind `feast.mlflow` module to a `FeatureStore` (optional)

Passthrough behavior

The feast.mlflow module delegates any attribute not listed above to the raw mlflow module. This means you can use feast.mlflow as a drop-in replacement for import mlflow:

feast.mlflow.log_params(params)             # passes through to mlflow.log_params
feast.mlflow.log_metrics(metrics)
feast.mlflow.set_tag("env", "staging")
feast.mlflow.MlflowClient()

store.mlflow does not have this passthrough — it only exposes the Feast-enhanced and Feast-only methods listed above. To access raw mlflow functions from store.mlflow, use the escape hatches:

store.mlflow.client.log_param(run_id, "lr", "0.01")  # via MlflowClient instance
store.mlflow.mlflow.log_params(params)                # via raw mlflow module

Resolve a model back to its feature service

from feast import FeatureStore

store = FeatureStore(".")
fs_name = store.mlflow.resolve_features("models:/driver_model/1")
# Returns: "driver_activity_v1"

Resolution order:

Model version tag feast.feature_service (set by register_model())
Training run tag feast.feature_service (set by auto-logging)

Reproduce training from a past run

from feast import FeatureStore

store = FeatureStore(".")

entity_df = store.mlflow.get_training_entity_df(run_id="abc123")

with store.mlflow.start_run(run_name="retrain_v2"):
    new_df = store.get_historical_features(
        features=store.get_feature_service("driver_activity_v1"),
        entity_df=entity_df,
    ).to_df()
    model = train(new_df)
    store.mlflow.log_model(model, "model")

This requires auto_log_entity_df: true to have been enabled when the original run was recorded.

Feast UI integration

The Feast UI server exposes three API endpoints that aggregate data from MLflow:

Endpoint	Description
`/api/mlflow-runs`	All Feast-tagged MLflow runs with linked registered models
`/api/mlflow-feature-usage`	Per-feature-view usage stats (run count, last used, associated models)
`/api/mlflow-feature-models`	Reverse index of feature refs to registered models

The feature view detail page in the Feast UI displays:

MLflow Training Runs count and Last Used date in the header stats
An MLflow Usage panel showing training run count, relative last-used time, and a table of registered models that depend on the feature view

Start the Feast UI with:

feast ui --host 127.0.0.1 --port 8888

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLflow Integration

Overview

Capabilities

Installation

Configuration

Configuration options

Tracking URI resolution

What gets logged

Tags on retrieval runs

Metrics

Artifacts

Entity DataFrame metadata

Operation logs

Usage

Automatic logging (zero code)

`store.mlflow` API (recommended)

`feast.mlflow` module API (alternative)

Store resolution

Error handling

Feast-enhanced functions

Feast-only functions

Passthrough behavior

Resolve a model back to its feature service

Reproduce training from a past run

Feast UI integration

FilesExpand file tree

mlflow.md

Latest commit

History

mlflow.md

File metadata and controls

MLflow Integration

Overview

Capabilities

Installation

Configuration

Configuration options

Tracking URI resolution

What gets logged

Tags on retrieval runs

Metrics

Artifacts

Entity DataFrame metadata

Operation logs

Usage

Automatic logging (zero code)

store.mlflow API (recommended)

feast.mlflow module API (alternative)

Store resolution

Error handling

Feast-enhanced functions

Feast-only functions

Passthrough behavior

Resolve a model back to its feature service

Reproduce training from a past run

Feast UI integration

`store.mlflow` API (recommended)

`feast.mlflow` module API (alternative)