feast-dev · ntkathole · Mar 28, 2026 · Mar 29, 2026
diff --git a/community/governance.md b/community/governance.md
@@ -217,8 +217,9 @@ The "RFC" (request for comments) process is intended to provide a consistent and
 2. Users, Contributors, and Maintainers discuss and upvote the draft
 3. If confident on its success, contributor completes the RFC with more in-detail technical specifications
 4. Maintainers approve RFC when it is ready
-5. Maintainers meet every quarter and choose three or five items based on popularity and alignment with project vision and goals
-6. Those selected items become part of the Mid-term goals
+5. Once finalized, the RFC should be added as an [Architecture Decision Record (ADR)](../docs/adr/README.md) in the repository
+6. Maintainers meet every quarter and choose three or five items based on popularity and alignment with project vision and goals
+7. Those selected items become part of the Mid-term goals
 
 
 ### When to Use RFCs

@@ -181,3 +181,15 @@
 * [Versioning policy](project/versioning-policy.md)
 * [Release process](project/release-process.md)
 * [Feast 0.9 vs Feast 0.10+](project/feast-0.9-vs-feast-0.10+.md)
+* [Architecture Decision Records](adr/README.md)
+  * [ADR-0001: Feature Services](adr/ADR-0001-feature-services.md)
+  * [ADR-0002: Component Refactor](adr/ADR-0002-component-refactor.md)
+  * [ADR-0003: On-Demand Transformations](adr/ADR-0003-on-demand-transformations.md)
+  * [ADR-0004: Entity Join Key Mapping](adr/ADR-0004-entity-join-key-mapping.md)
+  * [ADR-0005: Stream Transformations](adr/ADR-0005-stream-transformations.md)
+  * [ADR-0006: Kubernetes Operator](adr/ADR-0006-kubernetes-operator.md)
+  * [ADR-0007: Unified Feature Transformations](adr/ADR-0007-unified-feature-transformations.md)
+  * [ADR-0008: Feature View Versioning](adr/ADR-0008-feature-view-versioning.md)
+  * [ADR-0009: Contribution and Extensibility](adr/ADR-0009-contribution-extensibility.md)
+  * [ADR-0010: Vector Database Integration](adr/ADR-0010-vector-database-integration.md)
+  * [ADR-0011: Data Quality Monitoring](adr/ADR-0011-data-quality-monitoring.md)
@@ -0,0 +1,87 @@
+# ADR-0001: Feature Services
+
+## Status
+
+Accepted
+
+## Context
+
+Feast's Feature Views allowed for storage-level grouping of features based on how they are produced. However, there was no concept of a retrieval-level grouping of features that maps to models. Without this:
+
+- There was no way to track which features were used to train a model or serve a specific model.
+- Retrieving features during training required a complete list of features to be provided and persisted manually, which was error-prone.
+- There was no way to ensure consumers wouldn't face breaking changes when feature views changed.
+
+## Decision
+
+Introduce a `FeatureService` object that allows users to define which features to use for a specific ML use case. A feature service groups features from one or more feature views for model training and online serving.
+
+### API Design
+
+Feature services use a Pandas-like API where feature views can be referenced directly:
+
+```python
+from feast import FeatureService
+
+feature_service = FeatureService(
+    name="my_model_v1",
+    features=[
+        shop_raw,                                           # select all features
+        customer_sales[["average_order_value", "max_order_value"]],  # select specific features
+    ],
+)
+```
+
+Feature selection with aliasing:
+
+```python
+feature_service = FeatureService(
+    name="my_model_v1",
+    features=[
+        shop_raw,
+        customer_sales[["average_order_value", "max_order_value"]]
+            .alias({"average_order_value": "avg_o_val"}),
+    ],
+)
+```
+
+### Retrieval
+
+```python
+# Online inference
+row = store.get_online_features(
+    feature_service="my_model_v1",
+    entity_rows=[{"customer_id": 123, "shop_id": 456}],
+).to_dict()
+
+# Training
+historical_df = store.get_historical_features(
+    feature_service="my_model_v1",
+    entity_df=entity_df,
+)
+```
+
+### Key Decisions
+
+- **Name**: `FeatureService` was chosen over `FeatureSet` because it conveys the concept of a serving layer bridging models and data. `FeatureService` is analogous to model services in model serving systems.
+- **Mutability**: Feature services are mutable. Immutability may be considered in the future.
+- **Versioning**: Not included in the first version; users manage versions through naming conventions.
+
+## Consequences
+
+### Positive
+
+- Users can track which features are used for training and serving specific models.
+- Provides a consistent interface for both online and offline feature retrieval.
+- Reduces error-prone manual feature list management.
+- Enables future functionality like logging, monitoring, and endpoint provisioning.
+
+### Negative
+
+- Adds another abstraction layer to the Feast data model.
+- Feature services are mutable, which may lead to inconsistencies if not carefully managed.
+
+## References
+
+- Original RFC: [Feast RFC-015: Feature Services](https://docs.google.com/document/d/1jC0RJbyYLilXTOrLVBeR22PYLK5fe2JmQK1mKdZ-eno/edit)
+- Implementation: `sdk/python/feast/feature_service.py`
@@ -0,0 +1,71 @@
+# ADR-0002: Component Refactor
+
+## Status
+
+Accepted
+
+## Context
+
+The Feast project originally existed as a single monolithic repository containing many tightly coupled components: Core Registry, Serving Service, Job Service, Client Libraries, Spark ingestion code, Helm charts, and Terraform configurations.
+
+Two distinct user groups were identified:
+
+- **Platform teams**: Capable of running a complete feature store on Kubernetes with Spark, managing large-scale infrastructure.
+- **Solution teams**: Small data science or data engineering teams wanting to solve ML business problems without deploying and managing Kubernetes or Spark clusters.
+
+Delivering a viable minimal product to solution teams required a lighter-weight approach. However, the monolithic codebase made this difficult due to tight coupling between components.
+
+## Decision
+
+Adopt a staged approach to decouple the Feast codebase into modular, composable components:
+
+### Stage 1: Move Out Non-Core Components
+
+Split the monorepo into focused repositories:
+
+- **feast** (main repo): Feast Python SDK, Documentation, and Protos (starting at v0.10.0).
+- **feast-java**: Core Registry, Serving, and Java Client.
+- **feast-spark**: Spark Ingestion, Spark Python SDK, and Job Service.
+- **feast-helm-charts**: Helm charts for Kubernetes deployments.
+
+### Stage 2: Document Contracts
+
+Document all component-level contracts (I/O), API specifications (Protobuf), data contracts, and architecture diagrams.
+
+### Stage 3: Remove Coupling
+
+Remove unnecessary coupling between components, keeping only service contracts (Protobuf), data contracts, and integration tests as shared dependencies.
+
+### Stage 4: Converge
+
+Reverse the relationship so the main Feast SDK can use Spark-related code as a specific compute provider, rather than requiring it.
+
+### Key Principles
+
+- The main Feast repository provides a fully functional Python-based feature store that works without infrastructure dependencies.
+- Spark and Kubernetes-based components remain available for platform teams.
+- All existing functionality is maintained with no breaking changes during the transition.
+
+## Consequences
+
+### Positive
+
+- Enabled a super lightweight core framework for Feast that teams can start with in seconds.
+- Made it possible for teams to pick and choose components they want to adopt.
+- Teams with existing internal implementations (ingestion, registry, serving) can integrate more easily.
+- The Python SDK became the primary entry point, significantly lowering the barrier to getting started.
+
+### Negative
+
+- Temporary divergence between Feast and Feast-Spark codebases during the transition.
+- Multiple repositories added coordination overhead during the migration period.
+
+### Neutral
+
+- Components have since been reconverged into the main repository with a cleaner separation of concerns.
+- The Go, Java, and Python SDKs coexist in the main repository under separate directories.
+
+## References
+
+- Original RFC: [Feast RFC-020: Component Refactor](https://docs.google.com/document/d/1CjR3Ph3l65hF5bRuchR9u9WSoirnIuEb7ILY9Ioh1Sk/edit)
+- GitHub Discussion: [#1353](https://github.com/feast-dev/feast/discussions/1353)
@@ -0,0 +1,97 @@
+# ADR-0003: On-Demand Transformations
+
+## Status
+
+Accepted
+
+## Context
+
+For many ML use cases, it is not possible or feasible to precompute and persist feature values for serving:
+
+- **Transactional use cases**: Inputs are part of the transaction/booking/order event.
+- **Clickstream use cases**: User event data contains raw data used for feature engineering.
+- **Location-based use cases**: Distance calculations between feature views (e.g., customer and driver locations).
+- **Time-dependent features**: e.g., `user_account_age = current_time - account_creation_time`.
+- **Crossed features**: e.g., user-user, user-tweet based features where the keyspace is too large to precompute.
+
+Additionally, Feast did not provide a means for post-processing features, forcing all feature development to upstream systems.
+
+## Decision
+
+Introduce **On-Demand Feature Views** as a feature transformation layer with the following properties:
+
+- Transformations execute at retrieval time (post-processing step after reading from the store).
+- The calling client can input data as part of the retrieval request via a `RequestSource`.
+- Users define arbitrary transformations on both stored features and request-time input data.
+- Transformations are row-level operations only (no aggregations).
+
+### Definition API
+
+Uses the `@on_demand_feature_view` decorator (Option 3 from the RFC was chosen):
+
+```python
+from feast import on_demand_feature_view, Field, RequestSource
+from feast.types import Float64, String
+
+input_request = RequestSource(
+    name="transaction",
+    schema=[Field(name="input_lat", dtype=Float64), Field(name="input_lon", dtype=Float64)],
+)
+
+@on_demand_feature_view(
+    sources=[driver_fv, input_request],
+    schema=[Field(name="distance", dtype=Float64)],
+)
+def driver_distance(inputs: pd.DataFrame) -> pd.DataFrame:
+    from haversine import haversine
+    df = pd.DataFrame()
+    df["distance"] = inputs.apply(
+        lambda r: haversine((r["lat"], r["lon"]), (r["input_lat"], r["input_lon"])),
+        axis=1,
+    )
+    return df
+```
+
+### Retrieval
+
+```python
+# Online - request data passed as entity rows
+features = store.get_online_features(
+    features=["driver_distance:distance"],
+    entity_rows=[{"driver_id": 1001, "input_lat": 1.234, "input_lon": 5.678}],
+).to_dict()
+
+# Offline - request data columns included in entity_df
+df = store.get_historical_features(
+    entity_df=entity_df_with_request_columns,
+    features=["driver_distance:distance"],
+).to_df()
+```
+
+### Key Decisions
+
+- **Decorator approach** chosen over adding transforms to FeatureService or FeatureView directly. This avoids changing existing APIs and keeps transformations self-contained.
+- **Pandas DataFrames** as the input/output type to support vectorized operations.
+- **All imports must be self-contained** within the function block for serialization.
+- **Offline transformations** initially execute client-side using Dask for scalability.
+- **Feature Transformation Server (FTS)** handles online transformations via HTTP/REST, deployed at `apply` time.
+
+## Consequences
+
+### Positive
+
+- Enables real-time feature engineering that depends on request-time data.
+- Keeps feature logic co-located with feature definitions in the repository.
+- Provides a consistent interface for both online and offline feature retrieval.
+- The FTS allows horizontal scaling independent of feature serving.
+
+### Negative
+
+- Adds computational overhead to the serving path since transformations run at read time.
+- On-demand feature views are limited to row-level transformations (no aggregations).
+- Python function serialization requires self-contained imports within function blocks.
+
+## References
+
+- Original RFC: [Feast RFC-021: On-Demand Transformations](https://docs.google.com/document/d/1lgfIw0Drc65LpaxbUu49RCeJgMew547meSJttnUqz7c/edit)
+- Implementation: `sdk/python/feast/on_demand_feature_view.py`
@@ -0,0 +1,78 @@
+# ADR-0004: Entity Join Key Mapping
+
+## Status
+
+Accepted
+
+## Context
+
+Multiple different entity keys in the source data may need to map onto the same entity from the feature data table during a join. For example, `spammer_id` and `reporter_id` may both need the `years_on_platform` feature from a table keyed by `user_id`.
+
+Without entity join key mapping:
+
+- Users had to rename columns in their entity dataframe to match the feature view's join key before retrieval.
+- It was impossible to join a feature view twice on two different columns in the entity data (e.g., getting user features for both `spammer_id` and `reporter_id` in the same query).
+
+### Example
+
+Entity source data:
+
+| spammer_id | reporter_id | timestamp  |
+|------------|-------------|------------|
+| 2          | 8           | 1629909366 |
+| 1          | 2           | 1629909323 |
+
+Desired joined data should include `spammer_feature_a` and `reporter_feature_a`, both sourced from the same `user` feature view but joined on different keys.
+
+## Decision
+
+Implement join key overrides using a `with_join_key_map()` method on feature views, combined with `with_name()` for disambiguation. This was **Option 8b** from the RFC.
+
+### API
+
+```python
+abuse_feature_service = FeatureService(
+    name="my_abuse_model_v1",
+    features=[
+        user_features
+            .with_name("reporter_features")
+            .with_join_key_map({"user_id": "reporter_id"}),
+        user_features
+            .with_name("spammer_features")
+            .with_join_key_map({"user_id": "spammer_id"}),
+    ],
+)
+```
+
+### Key Decisions
+
+- **Query-time mapping** rather than registration-time. This provides flexibility since the same feature view can be used with different mappings in different contexts.
+- **Join key level mapping** rather than entity-level mapping. While entity-level mapping (Option 10) better preserves abstraction boundaries, join key mapping is more flexible and doesn't require registering additional entities.
+- **`with_name()` required** when using the same feature view multiple times to avoid output column name collisions. If omitted, a name collision error is raised.
+- **Mapping overwrites wholly**: specifying a mapping replaces the default join behavior entirely. If you want the original join key included, it must be explicitly listed.
+
+### Implementation
+
+- **Offline (historical) retrieval**: After feature subtable cleaning and dedup, entity columns are renamed based on the mapping before the join.
+- **Online retrieval**: Shadow entity keys are translated to the original join key for the online store lookup, then results are remapped to the shadow entity names.
+- The `join_key_map` is stored on `FeatureViewProjection` and flows through both online and offline retrieval paths.
+
+## Consequences
+
+### Positive
+
+- Users can join the same feature view on different entity columns in a single query.
+- No need to register additional entities or manually rename columns before retrieval.
+- Works consistently across both online and offline retrieval.
+- Feature view definitions remain clean and reusable.
+
+### Negative
+
+- Adds complexity to the retrieval path with column renaming logic.
+- Users must remember to use `with_name()` to avoid collisions when joining the same feature view multiple times.
+
+## References
+
+- Original RFC: [Feast RFC-023: Shadow Entities Mapping](https://docs.google.com/document/d/1TsCwKf3nVXTAfL0f8i26jnCgHA3bRd4dKQ8QdM87vIA/edit)
+- GitHub Issue: [#1762](https://github.com/feast-dev/feast/issues/1762)
+- Implementation: `sdk/python/feast/feature_view.py` (`with_join_key_map` method), `sdk/python/feast/feature_view_projection.py` (`join_key_map` field)