Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,8 @@ The list below contains the functionality that contributors are planning to deve
* [x] [Offline Feature Server (alpha)](https://docs.feast.dev/reference/feature-servers/offline-feature-server)
* [x] [Registry server (alpha)](https://github.com/feast-dev/feast/blob/master/docs/reference/feature-servers/registry-server.md)
* **Data Quality Management (See [RFC](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98/edit))**
* [x] Data profiling and validation (Great Expectations)
* [x] ~~Data profiling and validation (Great Expectations)~~ (deprecated)
* [x] [Feature Quality Monitoring](https://docs.feast.dev/how-to-guides/feature-monitoring) — built-in metrics, drift detection, serving log monitoring, and UI dashboard
* **Feature Discovery and Governance**
* [x] Python SDK for browsing feature registry
* [x] CLI for browsing feature registry
Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Feast helps ML platform/MLOps teams with DevOps experience productionize real-ti
* **batch feature engineering**: Feast supports on-demand and streaming transformations. Feast is also investing in supporting batch transformations.
* **native streaming feature integration:** Feast enables users to push streaming features, but does not pull from streaming sources or manage streaming pipelines.
* **lineage:** Feast helps tie feature values to model versions, but is not a complete solution for capturing end-to-end lineage from raw data sources to model versions. Feast also has community contributed plugins with [DataHub](https://datahubproject.io/docs/generated/ingestion/sources/feast/) and [Amundsen](https://github.com/amundsen-io/amundsen/blob/4a9d60176767c4d68d1cad5b093320ea22e26a49/databuilder/databuilder/extractor/feast\_extractor.py).
* **data quality / drift detection**: Feast has experimental integrations with [Great Expectations](https://greatexpectations.io/), but is not purpose built to solve data drift / data quality issues. This requires more sophisticated monitoring across data pipelines, served feature values, labels, and model versions.
* **data quality / drift detection**: Feast now includes built-in [Feature Quality Monitoring](how-to-guides/feature-monitoring.md) that computes statistical metrics (null rates, distributions, percentiles), detects drift across batch data and serving logs, and provides a monitoring UI dashboard. The older Great Expectations integration is deprecated.

## Example use cases

Expand Down
4 changes: 2 additions & 2 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
* [Fraud detection on GCP](tutorials/tutorials-overview/fraud-detection.md)
* [Real-time credit scoring on AWS](tutorials/tutorials-overview/real-time-credit-scoring-on-aws.md)
* [Driver stats on Snowflake](tutorials/tutorials-overview/driver-stats-on-snowflake.md)
* [Validating historical features with Great Expectations](tutorials/validating-historical-features.md)
* [\[Deprecated\] Validating historical features with Great Expectations](tutorials/validating-historical-features.md)
* [Building streaming features](tutorials/building-streaming-features.md)
* [Retrieval Augmented Generation (RAG) with Feast](tutorials/rag-with-docling.md)
* [RAG Fine Tuning with Feast and Milvus](../examples/rag-retriever/README.md)
Expand Down Expand Up @@ -205,7 +205,7 @@
* [\[Beta\] On demand feature view](reference/beta-on-demand-feature-view.md)
* [\[Alpha\] Static Artifacts Loading](reference/alpha-static-artifacts.md)
* [\[Alpha\] Vector Database](reference/alpha-vector-database.md)
* [\[Alpha\] Data quality monitoring](reference/dqm.md)
* [\[Deprecated\] Data quality monitoring (Great Expectations)](reference/dqm.md)
* [\[Alpha\] Streaming feature computation with Denormalized](reference/denormalized.md)
* [\[Alpha\] Feature View Versioning](reference/alpha-feature-view-versioning.md)
* [OpenLineage Integration](reference/openlineage.md)
Expand Down
15 changes: 14 additions & 1 deletion docs/adr/ADR-0011-data-quality-monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,21 @@ If validation fails, a `ValidationFailed` exception is raised with details for a
- Dependency on Great Expectations adds to the install footprint (optional via `feast[ge]`).
- Automatic profiling capabilities are limited; manual expectation crafting is recommended.

## Superseded

This ADR documents the original GE-based approach which is now **deprecated**. It has been superseded by Feast's built-in [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system (introduced in 2025), which provides:

- Automatic metric computation (null rates, percentiles, histograms) with no external dependencies
- Monitoring across batch data and serving logs
- CLI (`feast monitor run`) and REST API for automation
- Built-in UI monitoring dashboard
- Support for all offline store backends via SQL push-down

The GE-based integration may be removed in a future release.

## References

- Original RFC: Feast RFC-027: Data Quality Monitoring
- Implementation: `sdk/python/feast/dqm/`, `sdk/python/feast/saved_dataset.py`
- Documentation: [Data Quality Monitoring](../reference/dqm.md)
- Documentation: [Data Quality Monitoring (deprecated)](../reference/dqm.md)
- **New system:** [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md)
2 changes: 1 addition & 1 deletion docs/getting-started/concepts/dataset.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# \[Alpha] Saved dataset

Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. [Data Quality Monitoring](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98) was the primary motivation for creating dataset concept.
Feast datasets allow for conveniently saving dataframes that include both features and entities to be subsequently used for data analysis and model training. Data Quality Monitoring was the original motivation for creating the dataset concept. Note that the Great Expectations-based validation that used saved datasets is now deprecated in favor of Feast's built-in [Feature Quality Monitoring](../../how-to-guides/feature-monitoring.md) system, which does not require saved datasets.

Dataset's metadata is stored in the Feast registry and raw data (features, entities, additional input keys and timestamp) is stored in the [offline store](../components/offline-store.md).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ To fully implement the interface for the offline store, you will need to impleme
* `pull_latest_from_table_or_query` is invoked when running materialization (using the `feast materialize` or `feast materialize-incremental` commands, or the corresponding `FeatureStore.materialize()` method. This method pull data from the offline store, and the `FeatureStore` class takes care of writing this data into the online store.
* `get_historical_features` is invoked when reading values from the offline store using the `FeatureStore.get_historical_features()` method. Typically, this method is used to retrieve features when training ML models.
* (optional) `offline_write_batch` is a method that supports directly pushing a pyarrow table to a feature view. Given a feature view with a specific schema, this function should write the pyarrow table to the batch source defined. More details about the push api can be found [here](../docs/reference/data-sources/push.md). This method only needs implementation if you want to support the push api in your offline store.
* (optional) `pull_all_from_table_or_query` is a method that pulls all the data from an offline store from a specified start date to a specified end date. This method is only used for **SavedDatasets** as part of data quality monitoring validation.
* (optional) `pull_all_from_table_or_query` is a method that pulls all the data from an offline store from a specified start date to a specified end date. This method is used for **SavedDatasets** and as a fallback compute path for the [Feature Quality Monitoring](../../how-to-guides/feature-monitoring.md) system (backends without native SQL push-down).
* (optional) `write_logged_features` is a method that takes a pyarrow table or a path that points to a parquet file and writes the data to a defined source defined by `LoggingSource` and `LoggingConfig`. This method is only used internally for **SavedDatasets**.

{% code title="feast_custom_offline_store/file.py" %}
Expand Down
8 changes: 8 additions & 0 deletions docs/how-to-guides/feature-monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -462,3 +462,11 @@ The monitoring page is always accessible in the sidebar. To see actual data:

2. Run `feast apply` — this computes baseline metrics automatically
3. Schedule `feast monitor run` (or click "Compute Metrics" in the UI) to generate daily/weekly/monthly metrics

## Related: Operational and SOX Metrics

Feature Quality Monitoring focuses on **data-level** metrics (distributions, null rates, drift). Feast also provides **operational metrics** for infrastructure observability:

- **Prometheus metrics** (`feast_offline_store_*`, `feast_online_store_*`) — latency, throughput, and error rates for offline/online store operations. See [Python Feature Server — Metrics](../reference/feature-servers/python-feature-server.md).
- **SOX audit logging** (`feast.audit`) — structured audit events for compliance tracking of feature store operations.
- **OpenTelemetry integration** — distributed tracing for feature serving requests. See [OpenTelemetry Integration](../getting-started/components/open-telemetry.md).
2 changes: 1 addition & 1 deletion docs/reference/codebase-structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ The majority of Feast logic lives in these Python files:

There are also several important submodules:
* `infra/` contains all the infrastructure components, such as the provider, offline store, online store, batch materialization engine, and registry.
* `dqm/` covers data quality monitoring, such as the dataset profiler.
* `dqm/` covers data quality monitoring. The legacy Great Expectations profiler (`profilers/ge_profiler`) is deprecated; see [`monitoring/`](../../sdk/python/feast/monitoring/) for the current built-in monitoring system.
* `diff/` covers the logic for determining how to apply infrastructure changes upon feature repo changes (e.g. the output of `feast plan` and `feast apply`).
* `embedded_go/` covers the Go feature server.
* `ui/` contains the embedded Web UI, to be launched on the `feast ui` command.
Expand Down
84 changes: 48 additions & 36 deletions docs/reference/dqm.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,50 @@
# Data Quality Monitoring

Data Quality Monitoring (DQM) is a Feast module aimed to help users to validate their data with the user-curated set of rules.
Validation could be applied during:
* Historical retrieval (training dataset generation)
* [planned] Writing features into an online store
* [planned] Reading features from an online store
{% hint style="warning" %}
**Deprecated:** The Great Expectations-based validation described on this page is deprecated and will be removed in a future release. It has been superseded by Feast's built-in [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system, which provides richer metrics (histograms, percentiles, drift detection), works across batch data and serving logs, requires no external dependencies, and includes a built-in UI dashboard.

Its goal is to address several complex data problems, namely:
* Data consistency - new training datasets can be significantly different from previous datasets. This might require a change in model architecture.
* Issues/bugs in the upstream pipeline - bugs in upstream pipelines can cause invalid values to overwrite existing valid values in an online store.
* Training/serving skew - distribution shift could significantly decrease the performance of the model.
Please migrate to the new monitoring system. See the [Feature Quality Monitoring guide](../how-to-guides/feature-monitoring.md) for setup instructions.
{% endhint %}

> To monitor data quality, we check that the characteristics of the tested dataset (aka the tested dataset's profile) are "equivalent" to the characteristics of the reference dataset.
> How exactly profile equivalency should be measured is up to the user.
## Legacy: Great Expectations Integration

The following documents the deprecated Great Expectations-based validation that was previously the only DQM option in Feast. This integration relied on `pip install 'feast[ge]'` and only supported validation during historical retrieval.

---

### Overview

The validation process consists of the following steps:
1. User prepares reference dataset (currently only [saved datasets](../getting-started/concepts/dataset.md) from historical retrieval are supported).
2. User defines profiler function, which should produce profile by given dataset (currently only profilers based on [Great Expectations](https://docs.greatexpectations.io) are allowed).
3. Validation of tested dataset is performed with reference dataset and profiler provided as parameters.
The legacy validation process consists of the following steps:
1. User prepares reference dataset (only [saved datasets](../getting-started/concepts/dataset.md) from historical retrieval are supported).
2. User defines a profiler function that produces a profile using [Great Expectations](https://docs.greatexpectations.io).
3. Validation of the tested dataset is performed with the reference dataset and profiler provided as parameters.

### Preparations
Feast with Great Expectations support can be installed via
### Installation
```shell
pip install 'feast[ge]'
```

### Dataset profile
Currently, Feast supports only [Great Expectation's](https://greatexpectations.io/) [ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite)
as dataset's profile. Hence, the user needs to define a function (profiler) that would receive a dataset and return an [ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite).

Great Expectations supports automatic profiling as well as manually specifying expectations:
This integration uses [Great Expectation's](https://greatexpectations.io/) [ExpectationSuite](https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/core/expectation_suite/index.html#great_expectations.core.expectation_suite.ExpectationSuite)
as the dataset profile format. The user defines a profiler function that receives a dataset and returns an ExpectationSuite.

```python
from great_expectations.dataset import Dataset
from great_expectations.core.expectation_suite import ExpectationSuite

from feast.dqm.profilers.ge_profiler import ge_profiler

@ge_profiler
def automatic_profiler(dataset: Dataset) -> ExpectationSuite:
from great_expectations.profile.user_configurable_profiler import UserConfigurableProfiler

return UserConfigurableProfiler(
profile_dataset=dataset,
ignored_columns=['conv_rate'],
value_set_threshold='few'
).build_suite()
```
However, from our experience capabilities of automatic profiler are quite limited. So we would recommend crafting your own expectations:
```python
@ge_profiler
def manual_profiler(dataset: Dataset) -> ExpectationSuite:
dataset.expect_column_max_to_be_between("column", 1, 2)
return dataset.get_expectation_suite()
```



### Validating Training Dataset

During retrieval of historical features, `validation_reference` can be passed as a parameter to methods `.to_df(validation_reference=...)` or `.to_arrow(validation_reference=...)` of RetrievalJob.
If parameter is provided Feast will run validation once dataset is materialized. In case if validation successful materialized dataset is returned.
Otherwise, `feast.dqm.errors.ValidationFailed` exception would be raised. It will consist of all details for expectations that didn't pass.
If validation is successful, the materialized dataset is returned. Otherwise, `feast.dqm.errors.ValidationFailed` exception is raised with details for expectations that didn't pass.

```python
from feast import FeatureStore
Expand All @@ -75,3 +58,32 @@ job.to_df(
.as_reference(profiler=manual_profiler)
)
```

---

## Migration Guide

The new [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system replaces this integration with:

| Capability | GE-based (deprecated) | New DQM |
|---|---|---|
| Scope | Historical retrieval only | Batch data + serving logs |
| Dependencies | `feast[ge]` extra required | No extra dependencies |
| Metrics | User-defined expectations | Automatic: null rates, percentiles, histograms, drift |
| UI | None | Built-in monitoring dashboard |
| Automation | Manual profiler code | `feast monitor run` CLI + REST API |
| Backends | Limited | All offline store backends |

To migrate:

1. Enable DQM in `feature_store.yaml`:
```yaml
data_quality_monitoring:
auto_baseline: true
```

2. Run `feast apply` to compute baseline metrics automatically.

3. Schedule `feast monitor run` for ongoing monitoring.

4. Remove the `feast[ge]` dependency from your requirements.
3 changes: 2 additions & 1 deletion docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,8 @@ The list below contains the functionality that contributors are planning to deve
* [x] [Offline Feature Server (alpha)](https://docs.feast.dev/reference/feature-servers/offline-feature-server)
* [x] [Registry server (alpha)](https://github.com/feast-dev/feast/blob/master/docs/reference/feature-servers/registry-server.md)
* **Data Quality Management (See [RFC](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98/edit))**
* [x] Data profiling and validation (Great Expectations)
* [x] ~~Data profiling and validation (Great Expectations)~~ (deprecated)
* [x] [Feature Quality Monitoring](https://docs.feast.dev/how-to-guides/feature-monitoring) — built-in metrics, drift detection, serving log monitoring, and UI dashboard
* **Feature Discovery and Governance**
* [x] Python SDK for browsing feature registry
* [x] CLI for browsing feature registry
Expand Down
4 changes: 4 additions & 0 deletions docs/tutorials/validating-historical-features.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Validating historical features with Great Expectations

{% hint style="warning" %}
**Deprecated:** This tutorial demonstrates the legacy Great Expectations-based validation which is deprecated. For new projects, use Feast's built-in [Feature Quality Monitoring](../how-to-guides/feature-monitoring.md) system which provides automatic metrics computation, drift detection, and a monitoring UI — with no external dependencies required. See also the [Monitoring Quickstart notebook](../../examples/monitoring/monitoring-quickstart.ipynb).
{% endhint %}

In this tutorial, we will use the public dataset of Chicago taxi trips to present data validation capabilities of Feast.
- The original dataset is stored in BigQuery and consists of raw data for each taxi trip (one row per trip) since 2013.
- We will generate several training datasets (aka historical features in Feast) for different periods and evaluate expectations made on one dataset against another.
Expand Down
Loading
Loading