Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions community/governance.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,8 +217,9 @@ The "RFC" (request for comments) process is intended to provide a consistent and
2. Users, Contributors, and Maintainers discuss and upvote the draft
3. If confident on its success, contributor completes the RFC with more in-detail technical specifications
4. Maintainers approve RFC when it is ready
5. Maintainers meet every quarter and choose three or five items based on popularity and alignment with project vision and goals
6. Those selected items become part of the Mid-term goals
5. Once finalized, the RFC should be added as an [Architecture Decision Record (ADR)](../docs/adr/README.md) in the repository
6. Maintainers meet every quarter and choose three or five items based on popularity and alignment with project vision and goals
7. Those selected items become part of the Mid-term goals


### When to Use RFCs
Expand Down
12 changes: 12 additions & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,3 +181,15 @@
* [Versioning policy](project/versioning-policy.md)
* [Release process](project/release-process.md)
* [Feast 0.9 vs Feast 0.10+](project/feast-0.9-vs-feast-0.10+.md)
* [Architecture Decision Records](adr/README.md)
* [ADR-0001: Feature Services](adr/ADR-0001-feature-services.md)
* [ADR-0002: Component Refactor](adr/ADR-0002-component-refactor.md)
* [ADR-0003: On-Demand Transformations](adr/ADR-0003-on-demand-transformations.md)
* [ADR-0004: Entity Join Key Mapping](adr/ADR-0004-entity-join-key-mapping.md)
* [ADR-0005: Stream Transformations](adr/ADR-0005-stream-transformations.md)
* [ADR-0006: Kubernetes Operator](adr/ADR-0006-kubernetes-operator.md)
* [ADR-0007: Unified Feature Transformations](adr/ADR-0007-unified-feature-transformations.md)
* [ADR-0008: Feature View Versioning](adr/ADR-0008-feature-view-versioning.md)
* [ADR-0009: Contribution and Extensibility](adr/ADR-0009-contribution-extensibility.md)
* [ADR-0010: Vector Database Integration](adr/ADR-0010-vector-database-integration.md)
* [ADR-0011: Data Quality Monitoring](adr/ADR-0011-data-quality-monitoring.md)
87 changes: 87 additions & 0 deletions docs/adr/ADR-0001-feature-services.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# ADR-0001: Feature Services

## Status

Accepted

## Context

Feast's Feature Views allowed for storage-level grouping of features based on how they are produced. However, there was no concept of a retrieval-level grouping of features that maps to models. Without this:

- There was no way to track which features were used to train a model or serve a specific model.
- Retrieving features during training required a complete list of features to be provided and persisted manually, which was error-prone.
- There was no way to ensure consumers wouldn't face breaking changes when feature views changed.

## Decision

Introduce a `FeatureService` object that allows users to define which features to use for a specific ML use case. A feature service groups features from one or more feature views for model training and online serving.

### API Design

Feature services use a Pandas-like API where feature views can be referenced directly:

```python
from feast import FeatureService

feature_service = FeatureService(
name="my_model_v1",
features=[
shop_raw, # select all features
customer_sales[["average_order_value", "max_order_value"]], # select specific features
],
)
```

Feature selection with aliasing:

```python
feature_service = FeatureService(
name="my_model_v1",
features=[
shop_raw,
customer_sales[["average_order_value", "max_order_value"]]
.alias({"average_order_value": "avg_o_val"}),
],
)
```

### Retrieval

```python
# Online inference
row = store.get_online_features(
feature_service="my_model_v1",
entity_rows=[{"customer_id": 123, "shop_id": 456}],
).to_dict()

# Training
historical_df = store.get_historical_features(
feature_service="my_model_v1",
entity_df=entity_df,
)
```

### Key Decisions

- **Name**: `FeatureService` was chosen over `FeatureSet` because it conveys the concept of a serving layer bridging models and data. `FeatureService` is analogous to model services in model serving systems.
- **Mutability**: Feature services are mutable. Immutability may be considered in the future.
- **Versioning**: Not included in the first version; users manage versions through naming conventions.

## Consequences

### Positive

- Users can track which features are used for training and serving specific models.
- Provides a consistent interface for both online and offline feature retrieval.
- Reduces error-prone manual feature list management.
- Enables future functionality like logging, monitoring, and endpoint provisioning.

### Negative

- Adds another abstraction layer to the Feast data model.
- Feature services are mutable, which may lead to inconsistencies if not carefully managed.

## References

- Original RFC: [Feast RFC-015: Feature Services](https://docs.google.com/document/d/1jC0RJbyYLilXTOrLVBeR22PYLK5fe2JmQK1mKdZ-eno/edit)
- Implementation: `sdk/python/feast/feature_service.py`
71 changes: 71 additions & 0 deletions docs/adr/ADR-0002-component-refactor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# ADR-0002: Component Refactor

## Status

Accepted

## Context

The Feast project originally existed as a single monolithic repository containing many tightly coupled components: Core Registry, Serving Service, Job Service, Client Libraries, Spark ingestion code, Helm charts, and Terraform configurations.

Two distinct user groups were identified:

- **Platform teams**: Capable of running a complete feature store on Kubernetes with Spark, managing large-scale infrastructure.
- **Solution teams**: Small data science or data engineering teams wanting to solve ML business problems without deploying and managing Kubernetes or Spark clusters.

Delivering a viable minimal product to solution teams required a lighter-weight approach. However, the monolithic codebase made this difficult due to tight coupling between components.

## Decision

Adopt a staged approach to decouple the Feast codebase into modular, composable components:

### Stage 1: Move Out Non-Core Components

Split the monorepo into focused repositories:

- **feast** (main repo): Feast Python SDK, Documentation, and Protos (starting at v0.10.0).
- **feast-java**: Core Registry, Serving, and Java Client.
- **feast-spark**: Spark Ingestion, Spark Python SDK, and Job Service.
- **feast-helm-charts**: Helm charts for Kubernetes deployments.

### Stage 2: Document Contracts

Document all component-level contracts (I/O), API specifications (Protobuf), data contracts, and architecture diagrams.

### Stage 3: Remove Coupling

Remove unnecessary coupling between components, keeping only service contracts (Protobuf), data contracts, and integration tests as shared dependencies.

### Stage 4: Converge

Reverse the relationship so the main Feast SDK can use Spark-related code as a specific compute provider, rather than requiring it.

### Key Principles

- The main Feast repository provides a fully functional Python-based feature store that works without infrastructure dependencies.
- Spark and Kubernetes-based components remain available for platform teams.
- All existing functionality is maintained with no breaking changes during the transition.

## Consequences

### Positive

- Enabled a super lightweight core framework for Feast that teams can start with in seconds.
- Made it possible for teams to pick and choose components they want to adopt.
- Teams with existing internal implementations (ingestion, registry, serving) can integrate more easily.
- The Python SDK became the primary entry point, significantly lowering the barrier to getting started.

### Negative

- Temporary divergence between Feast and Feast-Spark codebases during the transition.
- Multiple repositories added coordination overhead during the migration period.

### Neutral

- Components have since been reconverged into the main repository with a cleaner separation of concerns.
- The Go, Java, and Python SDKs coexist in the main repository under separate directories.

## References

- Original RFC: [Feast RFC-020: Component Refactor](https://docs.google.com/document/d/1CjR3Ph3l65hF5bRuchR9u9WSoirnIuEb7ILY9Ioh1Sk/edit)
- GitHub Discussion: [#1353](https://github.com/feast-dev/feast/discussions/1353)
97 changes: 97 additions & 0 deletions docs/adr/ADR-0003-on-demand-transformations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# ADR-0003: On-Demand Transformations

## Status

Accepted

## Context

For many ML use cases, it is not possible or feasible to precompute and persist feature values for serving:

- **Transactional use cases**: Inputs are part of the transaction/booking/order event.
- **Clickstream use cases**: User event data contains raw data used for feature engineering.
- **Location-based use cases**: Distance calculations between feature views (e.g., customer and driver locations).
- **Time-dependent features**: e.g., `user_account_age = current_time - account_creation_time`.
- **Crossed features**: e.g., user-user, user-tweet based features where the keyspace is too large to precompute.

Additionally, Feast did not provide a means for post-processing features, forcing all feature development to upstream systems.

## Decision

Introduce **On-Demand Feature Views** as a feature transformation layer with the following properties:

- Transformations execute at retrieval time (post-processing step after reading from the store).
- The calling client can input data as part of the retrieval request via a `RequestSource`.
- Users define arbitrary transformations on both stored features and request-time input data.
- Transformations are row-level operations only (no aggregations).

### Definition API

Uses the `@on_demand_feature_view` decorator (Option 3 from the RFC was chosen):

```python
from feast import on_demand_feature_view, Field, RequestSource
from feast.types import Float64, String

input_request = RequestSource(
name="transaction",
schema=[Field(name="input_lat", dtype=Float64), Field(name="input_lon", dtype=Float64)],
)

@on_demand_feature_view(
sources=[driver_fv, input_request],
schema=[Field(name="distance", dtype=Float64)],
)
def driver_distance(inputs: pd.DataFrame) -> pd.DataFrame:
from haversine import haversine
df = pd.DataFrame()
df["distance"] = inputs.apply(
lambda r: haversine((r["lat"], r["lon"]), (r["input_lat"], r["input_lon"])),
axis=1,
)
return df
```

### Retrieval

```python
# Online - request data passed as entity rows
features = store.get_online_features(
features=["driver_distance:distance"],
entity_rows=[{"driver_id": 1001, "input_lat": 1.234, "input_lon": 5.678}],
).to_dict()

# Offline - request data columns included in entity_df
df = store.get_historical_features(
entity_df=entity_df_with_request_columns,
features=["driver_distance:distance"],
).to_df()
```

### Key Decisions

- **Decorator approach** chosen over adding transforms to FeatureService or FeatureView directly. This avoids changing existing APIs and keeps transformations self-contained.
- **Pandas DataFrames** as the input/output type to support vectorized operations.
- **All imports must be self-contained** within the function block for serialization.
- **Offline transformations** initially execute client-side using Dask for scalability.
- **Feature Transformation Server (FTS)** handles online transformations via HTTP/REST, deployed at `apply` time.

## Consequences

### Positive

- Enables real-time feature engineering that depends on request-time data.
- Keeps feature logic co-located with feature definitions in the repository.
- Provides a consistent interface for both online and offline feature retrieval.
- The FTS allows horizontal scaling independent of feature serving.

### Negative

- Adds computational overhead to the serving path since transformations run at read time.
- On-demand feature views are limited to row-level transformations (no aggregations).
- Python function serialization requires self-contained imports within function blocks.

## References

- Original RFC: [Feast RFC-021: On-Demand Transformations](https://docs.google.com/document/d/1lgfIw0Drc65LpaxbUu49RCeJgMew547meSJttnUqz7c/edit)
- Implementation: `sdk/python/feast/on_demand_feature_view.py`
78 changes: 78 additions & 0 deletions docs/adr/ADR-0004-entity-join-key-mapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# ADR-0004: Entity Join Key Mapping

## Status

Accepted

## Context

Multiple different entity keys in the source data may need to map onto the same entity from the feature data table during a join. For example, `spammer_id` and `reporter_id` may both need the `years_on_platform` feature from a table keyed by `user_id`.

Without entity join key mapping:

- Users had to rename columns in their entity dataframe to match the feature view's join key before retrieval.
- It was impossible to join a feature view twice on two different columns in the entity data (e.g., getting user features for both `spammer_id` and `reporter_id` in the same query).

### Example

Entity source data:

| spammer_id | reporter_id | timestamp |
|------------|-------------|------------|
| 2 | 8 | 1629909366 |
| 1 | 2 | 1629909323 |

Desired joined data should include `spammer_feature_a` and `reporter_feature_a`, both sourced from the same `user` feature view but joined on different keys.

## Decision

Implement join key overrides using a `with_join_key_map()` method on feature views, combined with `with_name()` for disambiguation. This was **Option 8b** from the RFC.

### API

```python
abuse_feature_service = FeatureService(
name="my_abuse_model_v1",
features=[
user_features
.with_name("reporter_features")
.with_join_key_map({"user_id": "reporter_id"}),
user_features
.with_name("spammer_features")
.with_join_key_map({"user_id": "spammer_id"}),
],
)
```

### Key Decisions

- **Query-time mapping** rather than registration-time. This provides flexibility since the same feature view can be used with different mappings in different contexts.
- **Join key level mapping** rather than entity-level mapping. While entity-level mapping (Option 10) better preserves abstraction boundaries, join key mapping is more flexible and doesn't require registering additional entities.
- **`with_name()` required** when using the same feature view multiple times to avoid output column name collisions. If omitted, a name collision error is raised.
- **Mapping overwrites wholly**: specifying a mapping replaces the default join behavior entirely. If you want the original join key included, it must be explicitly listed.

### Implementation

- **Offline (historical) retrieval**: After feature subtable cleaning and dedup, entity columns are renamed based on the mapping before the join.
- **Online retrieval**: Shadow entity keys are translated to the original join key for the online store lookup, then results are remapped to the shadow entity names.
- The `join_key_map` is stored on `FeatureViewProjection` and flows through both online and offline retrieval paths.

## Consequences

### Positive

- Users can join the same feature view on different entity columns in a single query.
- No need to register additional entities or manually rename columns before retrieval.
- Works consistently across both online and offline retrieval.
- Feature view definitions remain clean and reusable.

### Negative

- Adds complexity to the retrieval path with column renaming logic.
- Users must remember to use `with_name()` to avoid collisions when joining the same feature view multiple times.

## References

- Original RFC: [Feast RFC-023: Shadow Entities Mapping](https://docs.google.com/document/d/1TsCwKf3nVXTAfL0f8i26jnCgHA3bRd4dKQ8QdM87vIA/edit)
- GitHub Issue: [#1762](https://github.com/feast-dev/feast/issues/1762)
- Implementation: `sdk/python/feast/feature_view.py` (`with_join_key_map` method), `sdk/python/feast/feature_view_projection.py` (`join_key_map` field)
Loading
Loading