This section covers advanced use cases and extension points for power users and contributors who need to customize Feast beyond its default configurations. These topics enable you to:
For basic concepts and getting started, see Overview. For production deployment patterns, see Production Deployment Patterns.
Feast's architecture is designed with modularity as a core principle, enabling organizations to replace or extend components while maintaining compatibility with the rest of the system. The architecture uses abstract base classes (ABCs) that define contracts for different system components, allowing custom implementations to be plugged in through configuration.
Extension Points in Feast Architecture
Sources: sdk/python/feast/repo_config.py39-107 sdk/python/feast/infra/provider.py49-67 sdk/python/feast/infra/passthrough_provider.py58-129
Feast defines several critical ABCs that serve as extension points:
| Component | ABC Location | Primary Abstract Methods | Configuration |
|---|---|---|---|
Provider | sdk/python/feast/infra/provider.py49-531 | update_infra(), teardown_infra(), online_write_batch(), materialize_single_feature_view(), get_historical_features(), get_online_features(), validate_data_source() | Configured via provider field in feature_store.yaml |
OfflineStore | sdk/python/feast/infra/offline_stores/offline_store.py35-218 | get_historical_features(), pull_latest_from_table_or_query(), pull_all_from_table_or_query(), offline_write_batch() (optional) | Requires companion OfflineStoreConfig class with suffix Config |
OnlineStore | sdk/python/feast/infra/online_stores/online_store.py35-326 | online_write_batch(), online_read(), update(), teardown(), retrieve_online_documents() (optional) | Requires companion OnlineStoreConfig class with suffix Config |
ComputeEngine | sdk/python/feast/infra/compute_engines/base.py | materialize(), materialize_incremental(), update(), teardown_infra() | Configured via batch_engine field |
Each store implementation must provide a corresponding configuration class. For example, SqliteOnlineStore requires SqliteOnlineStoreConfig sdk/python/feast/infra/online_stores/sqlite.py103-115 The config class name is derived by appending Config to the store class name sdk/python/feast/repo_config.py683-686
Sources: sdk/python/feast/infra/provider.py49-531 sdk/python/feast/infra/online_stores/online_store.py35-326 sdk/python/feast/infra/online_stores/sqlite.py103-115 sdk/python/feast/repo_config.py678-714
Feast loads custom implementations dynamically using type strings from feature_store.yaml. The RepoConfig class validates configuration and instantiates the appropriate classes:
Configuration Loading and Plugin Instantiation Sequence
The resolution process works as follows:
load_repo_config() sdk/python/feast/repo_config.py720-770 reads feature_store.yamlRepoConfig.__init__() sdk/python/feast/repo_config.py318-364 validates each store configuration_validate_offline_store_config() sdk/python/feast/repo_config.py531-555 checks offline store setupget_offline_config_from_type() sdk/python/feast/repo_config.py708-713 resolves type string to classimport_class() sdk/python/feast/importer.py dynamically loads the classCustom implementations must follow naming conventions:
OfflineStore sdk/python/feast/repo_config.py703-705OnlineStore sdk/python/feast/repo_config.py681-682Config suffix sdk/python/feast/repo_config.py683-686Sources: sdk/python/feast/repo_config.py318-714 sdk/python/feast/feature_store.py124-161 sdk/python/feast/importer.py29-52
Beyond the standard batch materialization workflow, Feast supports advanced patterns for real-time feature ingestion and complex data transformations.
The Push API enables real-time feature ingestion by writing directly to online and/or offline stores. The FeatureStore.push() method sdk/python/feast/feature_store.py1650-1750 accepts dataframes and routes them based on the PushMode configuration.
Push API Data Flow with Provider Delegation
The PushSource class sdk/python/feast/data_source.py443-542 defines three modes:
PushMode.ONLINE: Write only to online store sdk/python/feast/data_source.py28-30PushMode.OFFLINE: Write only to offline store sdk/python/feast/data_source.py31-33PushMode.ONLINE_AND_OFFLINE: Write to both stores sdk/python/feast/data_source.py34-36Key implementation methods:
FeatureStore.push() sdk/python/feast/feature_store.py1650-1750 - Main entry pointPassthroughProvider._prep_rows_to_write_for_ingestion() sdk/python/feast/infra/passthrough_provider.py340-390 - Data preparationPassthroughProvider.ingest_df() sdk/python/feast/infra/passthrough_provider.py175-189 - Online ingestionPassthroughProvider.ingest_df_to_offline_store() sdk/python/feast/infra/passthrough_provider.py207-219 - Offline ingestionSources: sdk/python/feast/data_source.py28-36 sdk/python/feast/data_source.py443-542 sdk/python/feast/feature_store.py1650-1750 sdk/python/feast/infra/passthrough_provider.py175-219 sdk/python/feast/infra/passthrough_provider.py340-390
StreamFeatureView sdk/python/feast/stream_feature_view.py extends FeatureView with dual-path data ingestion. It requires both a batch_source (for historical retrieval) and a stream_source (for real-time writes).
Supported stream sources:
PushSource sdk/python/feast/data_source.py443-542 - Generic push interfaceKafkaSource sdk/python/feast/data_source.py309-394 - Kafka stream integrationKinesisSource sdk/python/feast/data_source.py397-440 - AWS Kinesis integrationDual-Path Architecture of Stream Feature Views
When defining a StreamFeatureView, the stream_source must have a batch_source property sdk/python/feast/feature_view.py186-191 This ensures historical data can be retrieved even when using streaming ingestion for online features.
Example configuration pattern:
Sources: sdk/python/feast/stream_feature_view.py sdk/python/feast/feature_view.py179-200 sdk/python/feast/data_source.py309-542 sdk/python/feast/feature_store.py418-460
Feature service logging captures online feature serving requests and responses for monitoring, debugging, and analysis. The system writes logs to offline stores for batch analysis.
A FeatureService can be configured with logging via the logging_config parameter. The logged data includes entity values, feature values, timestamps, and optional request IDs.
Feature Service Logging Data Flow
Key components:
LoggingConfig sdk/python/feast/feature_logging.py - Configuration with sample rate and destinationLoggingDestination sdk/python/feast/feature_logging.py - Abstract destination (file/BigQuery/Redshift)FeatureServiceLoggingSource sdk/python/feast/feature_logging.py - Inferred schema from feature serviceThe logged schema includes:
event_timestamp - Timestamp of the feature retrievalcreated_timestamp - Log creation timerequest_id - Optional request identifierSources: sdk/python/feast/feature_logging.py sdk/python/feast/feature_service.py
Feature logs are written through the FeatureStore and Provider interfaces:
| Method | Location | Purpose |
|---|---|---|
FeatureStore.write_logged_features() | sdk/python/feast/feature_store.py1800-1866 | Writes feature logs to offline store |
FeatureStore.retrieve_feature_service_logs() | sdk/python/feast/feature_store.py1868-1938 | Retrieves logs for analysis within a time window |
Provider.write_feature_service_logs() | sdk/python/feast/infra/provider.py378-397 | Provider-level abstraction for log writing |
Provider.retrieve_feature_service_logs() | sdk/python/feast/infra/provider.py400-421 | Provider-level abstraction for log retrieval |
OfflineStore.write_logged_features() | Store-specific implementation | Writes to specific offline store backend |
Example usage for writing logs:
Example usage for retrieving logs:
The logged data includes all entities, features, event_timestamp, created_timestamp, and optional request_id fields. The schema is automatically inferred from the FeatureService definition sdk/python/feast/feature_logging.py
Sources: sdk/python/feast/feature_store.py1800-1938 sdk/python/feast/infra/provider.py378-421 sdk/python/feast/feature_logging.py
The default LocalComputeEngine loads all data into memory, which doesn't scale for large datasets. Custom compute engines enable distributed materialization by implementing the ComputeEngine ABC.
The ComputeEngine ABC sdk/python/feast/infra/compute_engines/base.py defines the contract for batch materialization:
ComputeEngine Class Hierarchy
Built-in compute engines registered in BATCH_ENGINE_CLASS_FOR_TYPE sdk/python/feast/repo_config.py46-53:
local: feast.infra.compute_engines.local.compute.LocalComputeEnginesnowflake.engine: feast.infra.compute_engines.snowflake.snowflake_engine.SnowflakeComputeEnginelambda: feast.infra.compute_engines.aws_lambda.lambda_engine.LambdaComputeEnginespark.engine: feast.infra.compute_engines.spark.compute.SparkComputeEngineray.engine: feast.infra.compute_engines.ray.compute.RayComputeEngineThe PassthroughProvider.batch_engine property sdk/python/feast/infra/passthrough_provider.py92-129 lazily instantiates the compute engine based on configuration.
Sources: sdk/python/feast/infra/compute_engines/base.py sdk/python/feast/repo_config.py46-53 sdk/python/feast/infra/passthrough_provider.py92-129
The MaterializationTask class sdk/python/feast/infra/common/materialization_job.py packages all information needed to materialize a single feature view:
| Attribute | Type | Purpose |
|---|---|---|
feature_view | Union[FeatureView, OnDemandFeatureView] | The feature view to materialize |
start_time | datetime | Start of materialization time range |
end_time | datetime | End of materialization time range |
project | str | Feast project name |
tqdm_builder | Callable[[int], tqdm] | Progress bar builder function |
The compute engine receives a list of MaterializationTask objects and can execute them in parallel or sequentially. Each task is independent and can be distributed across workers.
Example pattern for custom compute engine:
The MaterializationJob return type sdk/python/feast/infra/common/materialization_job.py tracks job status and provides methods to check completion.
Sources: sdk/python/feast/infra/common/materialization_job.py sdk/python/feast/infra/compute_engines/base.py
The Provider class sdk/python/feast/infra/provider.py49-531 serves as the primary orchestration layer, coordinating between offline stores, online stores, and compute engines. The PassthroughProvider sdk/python/feast/infra/passthrough_provider.py58-549 is the default implementation that delegates operations to configured stores.
Key responsibilities of Provider:
| Category | Methods | Purpose |
|---|---|---|
| Infrastructure | update_infra(), teardown_infra(), plan_infra() | Manage cloud resources for feature views |
| Online Operations | online_write_batch(), online_read(), get_online_features(), retrieve_online_documents() | Low-latency feature serving |
| Offline Operations | get_historical_features(), retrieve_saved_dataset() | Point-in-time correct training data |
| Materialization | materialize_single_feature_view() | Batch load features to online store |
| Logging | write_feature_service_logs(), retrieve_feature_service_logs() | Feature serving observability |
| Validation | validate_data_source(), get_table_column_names_and_types_from_data_source() | Data source validation |
| Lifecycle | initialize(), close() | Async initialization and cleanup |
The PassthroughProvider delegates to stores via these properties:
offline_store sdk/python/feast/infra/passthrough_provider.py77-83 - Lazy-loaded via get_offline_store_from_config()online_store sdk/python/feast/infra/passthrough_provider.py69-75 - Lazy-loaded via get_online_store_from_config()batch_engine sdk/python/feast/infra/passthrough_provider.py92-129 - Lazy-loaded from configSources: sdk/python/feast/infra/provider.py49-531 sdk/python/feast/infra/passthrough_provider.py58-549
Feast supports asynchronous operations through async method variants. The async_supported property indicates which operations support async execution.
Async Capability Detection and Method Selection
Capability detection is done via the async_supported property:
Provider.async_supported sdk/python/feast/infra/provider.py64-66 returns ProviderAsyncMethodsOnlineStore.async_supported sdk/python/feast/infra/online_stores/online_store.py40-42 returns SupportedAsyncMethodsProviderAsyncMethods sdk/python/feast/infra/supported_async_methods.py has online field with read/write booleansAsync method pairs:
online_write_batch() / online_write_batch_async() sdk/python/feast/infra/provider.py123-173online_read() / online_read_async() sdk/python/feast/infra/provider.py284-359get_online_features() / get_online_features_async() sdk/python/feast/infra/provider.py308-335Example usage:
Sources: sdk/python/feast/infra/provider.py64-66 sdk/python/feast/infra/provider.py123-359 sdk/python/feast/infra/supported_async_methods.py sdk/python/feast/infra/online_stores/online_store.py40-42 sdk/python/tests/foo_provider.py36-49
The following subsections provide detailed guidance on specific advanced topics:
Covers how to implement custom offline stores, online stores, compute engines, and providers. Includes:
Deep dive into real-time feature ingestion:
Production monitoring and observability:
Sources: docs/SUMMARY.md72-79 README.md166-243
Refresh this wiki
This wiki was recently refreshed. Please wait 7 days to refresh again.