Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,11 +109,26 @@ print(training_df.head())
```

### 6. Load feature values into your online store

**Option 1: Incremental materialization (recommended)**
```commandline
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME
```

**Option 2: Full materialization with timestamps**
```commandline
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize 2021-04-12T00:00:00 $CURRENT_TIME
```

**Option 3: Simple materialization without timestamps**
```commandline
feast materialize --disable-event-timestamp
```

The `--disable-event-timestamp` flag allows you to materialize all available feature data using the current datetime as the event timestamp, without needing to specify start and end timestamps. This is useful when your source data lacks proper event timestamp columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do not need this flag, we can keep default end time as current time and start time as (current time - 24hr/a month) ?
Same for materialize-incremental command to have end date as current time by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my concern is the significant change that has for users, right? making it optional would silently ingest data for users without telling them about it. i'd rather do more code work to make them opt in rather than have users accidentally ingest data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair 👍


```commandline
Materializing feature view driver_hourly_stats from 2021-04-14 to 2021-04-15 done!
```
Expand Down
34 changes: 17 additions & 17 deletions docs/getting-started/components/online-store.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
# Online store

Feast uses online stores to serve features at low latency.
Feature values are loaded from data sources into the online store through _materialization_, which can be triggered through the `materialize` command.

The storage schema of features within the online store mirrors that of the original data source.
One key difference is that for each [entity key](../concepts/entity.md), only the latest feature values are stored.
No historical values are stored.

Here is an example batch data source:

![](../../.gitbook/assets/image%20%286%29.png)

Once the above data source is materialized into Feast (using `feast materialize`), the feature values will be stored as follows:

![](../../.gitbook/assets/image%20%285%29.png)

# Online store
Feast uses online stores to serve features at low latency.
Feature values are loaded from data sources into the online store through _materialization_, which can be triggered through the `materialize` command (either with specific timestamps or using `--disable-event-timestamp` to materialize all data with current timestamps).
The storage schema of features within the online store mirrors that of the original data source.
One key difference is that for each [entity key](../concepts/entity.md), only the latest feature values are stored.
No historical values are stored.
Here is an example batch data source:
![](../../.gitbook/assets/image%20%286%29.png)
Once the above data source is materialized into Feast (using `feast materialize` with timestamps or `feast materialize --disable-event-timestamp`), the feature values will be stored as follows:
![](../../.gitbook/assets/image%20%285%29.png)
Features can also be written directly to the online store via [push sources](../../reference/data-sources/push.md) .
2 changes: 1 addition & 1 deletion docs/getting-started/components/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
* **Create Batch Features:** ELT/ETL systems like Spark and SQL are used to transform data in the batch store.
* **Create Stream Features:** Stream features are created from streaming services such as Kafka or Kinesis, and can be pushed directly into Feast via the [Push API](../../reference/data-sources/push.md).
* **Feast Apply:** The user (or CI) publishes versioned controlled feature definitions using `feast apply`. This CLI command updates infrastructure and persists definitions in the object store registry.
* **Feast Materialize:** The user (or scheduler) executes `feast materialize` which loads features from the offline store into the online store.
* **Feast Materialize:** The user (or scheduler) executes `feast materialize` (with timestamps or `--disable-event-timestamp` to materialize all data with current timestamps) which loads features from the offline store into the online store.
* **Model Training:** A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset that can be used for training models.
* **Get Historical Features:** Feast exports a point-in-time correct training dataset based on the list of features and entity dataframe provided by the model training pipeline.
* **Deploy Model:** The trained model binary (and list of features) are deployed into a model serving system. This step is not executed by Feast.
Expand Down
6 changes: 6 additions & 0 deletions docs/getting-started/concepts/data-ingestion.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,17 @@ materialize_python = PythonOperator(

#### How to run this in the CLI

**With timestamps:**
```bash
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME
```

**Simple materialization (for data without event timestamps):**
```bash
feast materialize --disable-event-timestamp
```

#### How to run this on Airflow

```python
Expand Down
8 changes: 7 additions & 1 deletion docs/getting-started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -499,13 +499,19 @@ print(training_df.head())
We now serialize the latest values of features since the beginning of time to prepare for serving. Note, `materialize_incremental` serializes all new features since the last `materialize` call, or since the time provided minus the `ttl` timedelta. In this case, this will be `CURRENT_TIME - 1 day` (`ttl` was set on the `FeatureView` instances in [feature_repo/feature_repo/example_repo.py](feature_repo/feature_repo/example_repo.py)).

{% tabs %}
{% tab title="Bash" %}
{% tab title="Bash (with timestamp)" %}
```bash
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")

feast materialize-incremental $CURRENT_TIME
```
{% endtab %}
{% tab title="Bash (simple)" %}
```bash
# Alternative: Materialize all data using current timestamp (for data without event timestamps)
feast materialize --disable-event-timestamp
```
{% endtab %}
{% endtabs %}

{% tabs %}
Expand Down
16 changes: 14 additions & 2 deletions docs/reference/feast-cli-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,18 +152,30 @@ feast init -t gcp my_feature_repo

## Materialize

Load data from feature views into the online store between two dates
Load data from feature views into the online store.

**With timestamps:**
```bash
feast materialize 2020-01-01T00:00:00 2022-01-01T00:00:00
```

Load data for specific feature views into the online store between two dates
**Without timestamps (uses current datetime):**
```bash
feast materialize --disable-event-timestamp
```

Load data for specific feature views:

```text
feast materialize -v driver_hourly_stats 2020-01-01T00:00:00 2022-01-01T00:00:00
```

```text
feast materialize --disable-event-timestamp -v driver_hourly_stats
```

The `--disable-event-timestamp` flag is useful when your source data lacks event timestamp columns, allowing you to materialize all available data using the current datetime as the event timestamp.

```text
Materializing 1 feature views from 2020-01-01 to 2022-01-01

Expand Down
46 changes: 46 additions & 0 deletions docs/reference/feature-servers/python-feature-server.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,52 @@ requests.post(
data=json.dumps(push_data))
```

### Materializing features

The Python feature server also exposes an endpoint for materializing features from the offline store to the online store.

**Standard materialization with timestamps:**
```bash
curl -X POST "http://localhost:6566/materialize" -d '{
"start_ts": "2021-01-01T00:00:00",
"end_ts": "2021-01-02T00:00:00",
"feature_views": ["driver_hourly_stats"]
}' | jq
```

**Materialize all data without event timestamps:**
```bash
curl -X POST "http://localhost:6566/materialize" -d '{
"feature_views": ["driver_hourly_stats"],
"disable_event_timestamp": true
}' | jq
```

When `disable_event_timestamp` is set to `true`, the `start_ts` and `end_ts` parameters are not required, and all available data is materialized using the current datetime as the event timestamp. This is useful when your source data lacks proper event timestamp columns.

Or from Python:
```python
import json
import requests

# Standard materialization
materialize_data = {
"start_ts": "2021-01-01T00:00:00",
"end_ts": "2021-01-02T00:00:00",
"feature_views": ["driver_hourly_stats"]
}

# Materialize without event timestamps
materialize_data_no_timestamps = {
"feature_views": ["driver_hourly_stats"],
"disable_event_timestamp": True
}

requests.post(
"http://localhost:6566/materialize",
data=json.dumps(materialize_data))
```

## Starting the feature server in TLS(SSL) mode

Enabling TLS mode ensures that data between the Feast client and server is transmitted securely. For an ideal production environment, it is recommended to start the feature server in TLS mode.
Expand Down
15 changes: 15 additions & 0 deletions infra/templates/README.md.jinja2
Original file line number Diff line number Diff line change
Expand Up @@ -107,11 +107,26 @@ print(training_df.head())
```

### 6. Load feature values into your online store

**Option 1: Incremental materialization (recommended)**
```commandline
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME
```

**Option 2: Full materialization with timestamps**
```commandline
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize 2021-04-12T00:00:00 $CURRENT_TIME
```

**Option 3: Simple materialization without timestamps**
```commandline
feast materialize --disable-event-timestamp
```

The `--disable-event-timestamp` flag allows you to materialize all available feature data using the current datetime as the event timestamp, without needing to specify start and end timestamps. This is useful when your source data lacks proper event timestamp columns.

```commandline
Materializing feature view driver_hourly_stats from 2021-04-14 to 2021-04-15 done!
```
Expand Down
41 changes: 36 additions & 5 deletions sdk/python/feast/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,17 +303,26 @@ def registry_dump_command(ctx: click.Context):


@cli.command("materialize")
@click.argument("start_ts")
@click.argument("end_ts")
@click.argument("start_ts", required=False)
@click.argument("end_ts", required=False)
@click.option(
"--views",
"-v",
help="Feature views to materialize",
multiple=True,
)
@click.option(
"--disable-event-timestamp",
is_flag=True,
help="Materialize all available data using current datetime as event timestamp (useful when source data lacks event timestamps)",
)
@click.pass_context
def materialize_command(
ctx: click.Context, start_ts: str, end_ts: str, views: List[str]
ctx: click.Context,
start_ts: Optional[str],
end_ts: Optional[str],
views: List[str],
disable_event_timestamp: bool,
):
"""
Run a (non-incremental) materialization job to ingest data into the online store. Feast
Expand All @@ -322,13 +331,35 @@ def materialize_command(
Views will be materialized.

START_TS and END_TS should be in ISO 8601 format, e.g. '2021-07-16T19:20:01'

If --disable-event-timestamp is used, timestamps are not required and all available data will be materialized using the current datetime as the event timestamp.
"""
store = create_feature_store(ctx)

if disable_event_timestamp:
if start_ts or end_ts:
raise click.UsageError(
"Cannot specify START_TS or END_TS when --disable-event-timestamp is used"
)
now = datetime.now()
# Query all available data and use current datetime as event timestamp
start_date = datetime(
1970, 1, 1
) # Beginning of time to capture all historical data
end_date = now
else:
if not start_ts or not end_ts:
raise click.UsageError(
"START_TS and END_TS are required unless --disable-event-timestamp is used"
)
start_date = utils.make_tzaware(parser.parse(start_ts))
end_date = utils.make_tzaware(parser.parse(end_ts))

store.materialize(
feature_views=None if not views else views,
start_date=utils.make_tzaware(parser.parse(start_ts)),
end_date=utils.make_tzaware(parser.parse(end_ts)),
start_date=start_date,
end_date=end_date,
disable_event_timestamp=disable_event_timestamp,
)


Expand Down
27 changes: 23 additions & 4 deletions sdk/python/feast/feature_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import time
import traceback
from contextlib import asynccontextmanager
from datetime import datetime
from importlib import resources as importlib_resources
from typing import Any, Dict, List, Optional, Union

Expand Down Expand Up @@ -73,9 +74,10 @@ class PushFeaturesRequest(BaseModel):


class MaterializeRequest(BaseModel):
start_ts: str
end_ts: str
start_ts: Optional[str] = None
end_ts: Optional[str] = None
feature_views: Optional[List[str]] = None
disable_event_timestamp: bool = False


class MaterializeIncrementalRequest(BaseModel):
Expand Down Expand Up @@ -432,10 +434,27 @@ def materialize(request: MaterializeRequest) -> None:
resource=_get_feast_object(feature_view, True),
actions=[AuthzedAction.WRITE_ONLINE],
)

if request.disable_event_timestamp:
# Query all available data and use current datetime as event timestamp
now = datetime.now()
start_date = datetime(
1970, 1, 1
) # Beginning of time to capture all historical data
end_date = now
else:
if not request.start_ts or not request.end_ts:
raise ValueError(
"start_ts and end_ts are required when disable_event_timestamp is False"
)
start_date = utils.make_tzaware(parser.parse(request.start_ts))
end_date = utils.make_tzaware(parser.parse(request.end_ts))

store.materialize(
utils.make_tzaware(parser.parse(request.start_ts)),
utils.make_tzaware(parser.parse(request.end_ts)),
start_date,
end_date,
request.feature_views,
disable_event_timestamp=request.disable_event_timestamp,
)

@app.post("/materialize-incremental", dependencies=[Depends(inject_user_details)])
Expand Down
3 changes: 3 additions & 0 deletions sdk/python/feast/feature_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -1542,6 +1542,7 @@ def materialize(
start_date: datetime,
end_date: datetime,
feature_views: Optional[List[str]] = None,
disable_event_timestamp: bool = False,
) -> None:
"""
Materialize data from the offline store into the online store.
Expand All @@ -1555,6 +1556,7 @@ def materialize(
end_date (datetime): End date for time range of data to materialize into the online store
feature_views (List[str]): Optional list of feature view names. If selected, will only run
materialization for the specified feature views.
disable_event_timestamp (bool): If True, materializes all available data using current datetime as event timestamp instead of source event timestamps

Examples:
Materialize all features into the online store over the interval
Expand Down Expand Up @@ -1609,6 +1611,7 @@ def tqdm_builder(length):
registry=self._registry,
project=self.project,
tqdm_builder=tqdm_builder,
disable_event_timestamp=disable_event_timestamp,
)

self._registry.apply_materialization(
Expand Down
1 change: 1 addition & 0 deletions sdk/python/feast/infra/common/materialization_job.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ class MaterializationTask:
end_time: datetime
only_latest: bool = True
tqdm_builder: Union[None, Callable[[int], tqdm]] = None
disable_event_timestamp: bool = False


class MaterializationJobStatus(enum.Enum):
Expand Down
2 changes: 2 additions & 0 deletions sdk/python/feast/infra/passthrough_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -426,6 +426,7 @@ def materialize_single_feature_view(
registry: BaseRegistry,
project: str,
tqdm_builder: Callable[[int], tqdm],
disable_event_timestamp: bool = False,
) -> None:
if isinstance(feature_view, OnDemandFeatureView):
if not feature_view.write_to_online_store:
Expand All @@ -445,6 +446,7 @@ def materialize_single_feature_view(
start_time=start_date,
end_time=end_date,
tqdm_builder=tqdm_builder,
disable_event_timestamp=disable_event_timestamp,
)
jobs = self.batch_engine.materialize(registry, task)
assert len(jobs) == 1
Expand Down
Loading
Loading