You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This mechanism of retrieving features is only recommended as you're experimenting. Once you want to launch experiments or serve models, feature services are recommended.
80
80
81
-
Feature references uniquely identify feature values in Feast. The structure of a feature reference in string form is as follows: `<feature_view>:<feature>`
81
+
Feature references uniquely identify feature values in Feast. The structure of a feature reference in string form is as follows: `<feature_view>[@version]:<feature>`
82
+
83
+
The `@version` part is optional. When omitted, the latest (active) version is used. You can specify a version like `@v2` to read from a specific historical version snapshot.
82
84
83
85
Feature references are used for the retrieval of features from Feast:
84
86
85
87
```python
86
88
online_features = fs.get_online_features(
87
89
features=[
88
-
'driver_locations:lon',
89
-
'drivers_activity:trips_today'
90
+
'driver_locations:lon', # latest version (default)
91
+
'drivers_activity:trips_today', # latest version (default)
92
+
'drivers_activity@v2:trips_today', # specific version
Version-qualified reads (`@v<N>`) require `enable_online_feature_view_versioning: true` in your registry config and are currently supported only on the SQLite online store. See the [feature view versioning docs](feature-view.md#version-qualified-feature-references) for details.
104
+
{% endhint %}
105
+
98
106
It is possible to retrieve features from multiple feature views with a single request, and Feast is able to join features from multiple tables in order to build a training dataset. However, it is not possible to reference (or retrieve) features from multiple projects at the same time.
99
107
100
108
{% hint style="info" %}
@@ -107,6 +115,46 @@ The timestamp on which an event occurred, as found in a feature view's data sour
107
115
108
116
Event timestamps are used during point-in-time joins to ensure that the latest feature values are joined from feature views onto entity rows. Event timestamps are also used to ensure that old feature values aren't served to models during online serving.
109
117
118
+
#### Why `event_timestamp` is required in the entity dataframe
119
+
120
+
When calling `get_historical_features()`, the `entity_df` must include an `event_timestamp` column. This timestamp acts as the **upper bound (inclusive)** for which feature values are allowed to be retrieved for each entity row. Feast performs a point-in-time join (also called a "last known good value" temporal join): for each entity row, it retrieves the latest feature values with a timestamp **at or before** the entity row's `event_timestamp`.
121
+
122
+
This ensures **point-in-time correctness**, which is critical to prevent **data leakage** during model training. Without this constraint, features generated *after* the prediction time could leak into training data—effectively letting the model "see the future"—leading to inflated offline metrics that do not translate to real-world performance.
123
+
124
+
For example, if you want to predict whether a driver will be rated well on April 12 at 10:00 AM, the entity dataframe row should have `event_timestamp = datetime(2021, 4, 12, 10, 0, 0)`. Feast will then only join feature values observed on or before that time, excluding any data generated after 10:00 AM.
125
+
126
+
#### Retrieving features without an entity dataframe
127
+
128
+
While the entity dataframe is the standard way to retrieve historical features, Feast also supports **entity-less historical feature retrieval** by datetime range. This is useful when:
129
+
130
+
- You are training **time-series or population-level models** and don't have a pre-defined list of entity IDs.
131
+
- You want **all features in a time window** for exploratory analysis or batch training on full history.
132
+
- Constructing an entity dataframe upfront is unnecessarily complex or expensive.
133
+
134
+
Instead of passing `entity_df`, you specify a time window with `start_date` and/or `end_date`:
135
+
136
+
```python
137
+
from datetime import datetime
138
+
139
+
training_df = store.get_historical_features(
140
+
features=[
141
+
"driver_hourly_stats:conv_rate",
142
+
"driver_hourly_stats:acc_rate",
143
+
"driver_hourly_stats:avg_daily_trips",
144
+
],
145
+
start_date=datetime(2025, 7, 1),
146
+
end_date=datetime(2025, 7, 2),
147
+
).to_df()
148
+
```
149
+
150
+
If `start_date` is omitted, it defaults to `end_date` minus the feature view TTL. If `end_date` is omitted, it defaults to the current time. Point-in-time correctness is still preserved.
151
+
152
+
{% hint style="warning" %}
153
+
Entity-less retrieval is currently supported for the **Postgres**, **Dask**, **Spark**, and **Ray** offline stores. You cannot mix `entity_df` with `start_date`/`end_date` in the same call.
154
+
{% endhint %}
155
+
156
+
For more details, see the [FAQ](../faq.md#how-do-i-run-get_historical_features-without-providing-an-entity-dataframe) and [this blog post on entity-less historical feature retrieval](https://feast.dev/blog/entity-less-historical-features-retrieval/).
157
+
110
158
### Dataset
111
159
112
160
A dataset is a collection of rows that is produced by a historical retrieval from Feast in order to train a model. A dataset is produced by a join from one or more feature views onto an entity dataframe. Therefore, a dataset may consist of features from multiple feature views.
Copy file name to clipboardExpand all lines: docs/getting-started/concepts/feature-view.md
+19Lines changed: 19 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -160,6 +160,25 @@ Feature names must be unique within a [feature view](feature-view.md#feature-vie
160
160
161
161
Each field can have additional metadata associated with it, specified as key-value [tags](https://rtd.feast.dev/en/master/feast.html#feast.field.Field).
162
162
163
+
## \[Alpha\] Versioning
164
+
165
+
Feature views support automatic version tracking. Every time `feast apply` detects a schema or UDF change, a versioned snapshot is saved to the registry. This enables auditing what changed, reverting to a prior version, querying specific versions via `@v<N>` syntax, and staging new versions without promoting them.
166
+
167
+
Version history tracking is **always active** with no configuration needed. The `version` parameter is fully optional — omitting it preserves existing behavior.
168
+
169
+
```python
170
+
# Pin to a specific version (reverts the active definition to v2's snapshot)
171
+
driver_stats = FeatureView(
172
+
name="driver_stats",
173
+
entities=[driver],
174
+
schema=[...],
175
+
source=my_source,
176
+
version="v2",
177
+
)
178
+
```
179
+
180
+
For full details on version pinning, version-qualified reads, staged publishing (`--no-promote`), online store support, and known limitations, see the **[\[Alpha\] Feature View Versioning](../../reference/alpha-feature-view-versioning.md)** reference page.
181
+
163
182
## Schema Validation
164
183
165
184
Feature views support an optional `enable_validation` parameter that enables schema validation during materialization and historical feature retrieval. When enabled, Feast verifies that:
# Each timestamp acts as the upper bound for the point-in-time join:
374
+
# Feast retrieves the latest feature values at or before this time,
375
+
# preventing data leakage from future events.
373
376
"event_timestamp": [
374
377
datetime(2021, 4, 12, 10, 59, 42),
375
378
datetime(2021, 4, 12, 8, 12, 10),
@@ -498,7 +501,7 @@ print(training_df.head())
498
501
{% endtabs %}
499
502
### Step 6: Ingest batch features into your online store
500
503
501
-
We now serialize the latest values of features since the beginning of time to prepare for serving. Note, `materialize_incremental` serializes all new features since the last `materialize` call, or since the time provided minus the `ttl` timedelta. In this case, this will be `CURRENT_TIME - 1 day` (`ttl` was set on the `FeatureView` instances in [feature_repo/feature_repo/feature_definitions.py](feature_repo/feature_repo/feature_definitions.py)).
504
+
We now serialize the latest values of features since the beginning of time to prepare for serving. Note, `materialize_incremental` serializes all new features since the last `materialize` call, or since the time provided minus the `ttl` timedelta. In this case, this will be `CURRENT_TIME - 1 day` (`ttl` was set on the `FeatureView` instances in `feature_definitions.py`).
0 commit comments