In this tutorial we will
- Deploy a local feature store with a Parquet file offline store and Sqlite online store.
- Build a training dataset using our time series features from our Parquet files.
- Materialize feature values from the offline store into the online store.
- Read the latest features from the online store for inference.
You can run this tutorial in Google Colab or run it on your local host, following the guided steps below.
Run in Google Colab |
|---|
Install the Feast SDK and CLI using pip:
pip install feastBootstrap a new feature repository using feast init from the command line:
feast init feature_repo
cd feature_repo
Creating a new Feast repository in /home/Jovyan/feature_repo.
The apply command registers all the objects in your feature repository and deploys a feature store:
feast applyRegistered entity driver_id
Registered feature view driver_hourly_stats
Deploying infrastructure for driver_hourly_stats
The apply command builds a training dataset based on the time-series features defined in the feature repository:
from datetime import datetime
import pandas as pd
from feast import FeatureStore
entity_df = pd.DataFrame.from_dict(
{
"driver_id": [1001, 1002, 1003, 1004],
"event_timestamp": [
datetime(2021, 4, 12, 10, 59, 42),
datetime(2021, 4, 12, 8, 12, 10),
datetime(2021, 4, 12, 16, 40, 26),
datetime(2021, 4, 12, 15, 1, 12),
],
}
)
store = FeatureStore(repo_path=".")
training_df = store.get_historical_features(
entity_df=entity_df,
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
],
).to_df()
print(training_df.head())event_timestamp driver_id driver_hourly_stats__conv_rate driver_hourly_stats__acc_rate driver_hourly_stats__avg_daily_trips
2021-04-12 1002 0.328245 0.993218 329
2021-04-12 1001 0.448272 0.873785 767
2021-04-12 1004 0.822571 0.571790 673
2021-04-12 1003 0.556326 0.605357 335The materialize command loads the latest feature values from your feature views into your online store:
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIMEfrom pprint import pprint
from feast import FeatureStore
store = FeatureStore(repo_path=".")
feature_vector = store.get_online_features(
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
],
entity_rows=[{"driver_id": 1001}],
).to_dict()
pprint(feature_vector){
'driver_id': [1001],
'conv_rate': [0.49274],
'acc_rate': [0.92743],
'avg_daily_trips': [72],
}- Follow our Getting Started guide for a hands tutorial in using Feast
- Join other Feast users and contributors in Slack and become part of the community!
- For more tutorials, check the the Tutorials section.
