feast-dev · franciscojavierarceo · Jun 3, 2025 · May 14, 2025 · May 14, 2025 · May 15, 2025
@@ -1,9 +1,11 @@
 # Batch Materialization Engine
 
+Note: The materialization engine is not constructed via unified compute engine interface.
+
 A batch materialization engine is a component of Feast that's responsible for moving data from the offline store into the online store.
 
-A materialization engine abstracts over specific technologies or frameworks that are used to materialize data. It allows users to use a pure local serialized approach (which is the default LocalMaterializationEngine), or delegates the materialization to seperate components (e.g. AWS Lambda, as implemented by the the LambdaMaterializaionEngine).
+A materialization engine abstracts over specific technologies or frameworks that are used to materialize data. It allows users to use a pure local serialized approach (which is the default LocalComputeEngine), or delegates the materialization to seperate components (e.g. AWS Lambda, as implemented by the the LambdaComputeEngine).
 
-If the built-in engines are not sufficient, you can create your own custom materialization engine. Please see [this guide](../../how-to-guides/customizing-feast/creating-a-custom-materialization-engine.md) for more details.
+If the built-in engines are not sufficient, you can create your own custom materialization engine. Please see [this guide](../../how-to-guides/customizing-feast/creating-a-custom-compute-engine.md) for more details.
 
 Please see [feature\_store.yaml](../../reference/feature-repository/feature-store-yaml.md#overview) for configuring engines.
@@ -1,24 +1,24 @@
-# Adding a custom batch materialization engine
+# Adding a custom compute engine
 
 ### Overview
 
-Feast batch materialization operations (`materialize` and `materialize-incremental`) execute through a `BatchMaterializationEngine`.
+Feast batch materialization operations (`materialize` and `materialize-incremental`), and get_historical_features are executed through a `ComputeEngine`.
 
-Custom batch materialization engines allow Feast users to extend Feast to customize the materialization process. Examples include:
+Custom batch compute engines allow Feast users to extend Feast to customize the materialization and get_historical_features process. Examples include:
 
 * Setting up custom materialization-specific infrastructure during `feast apply` (e.g. setting up Spark clusters or Lambda Functions)
 * Launching custom batch ingestion (materialization) jobs (Spark, Beam, AWS Lambda)
 * Tearing down custom materialization-specific infrastructure during `feast teardown` (e.g. tearing down Spark clusters, or deleting Lambda Functions)
 
-Feast comes with built-in materialization engines, e.g, `LocalMaterializationEngine`, and an experimental `LambdaMaterializationEngine`. However, users can develop their own materialization engines by creating a class that implements the contract in the [BatchMaterializationEngine class](https://github.com/feast-dev/feast/blob/6d7b38a39024b7301c499c20cf4e7aef6137c47c/sdk/python/feast/infra/materialization/batch\_materialization\_engine.py#L72).
+Feast comes with built-in materialization engines, e.g, `LocalComputeEngine`, and an experimental `LambdaComputeEngine`. However, users can develop their own compute engines by creating a class that implements the contract in the [ComputeEngine class](https://github.com/feast-dev/feast/blob/85514edbb181df083e6a0d24672c00f0624dcaa3/sdk/python/feast/infra/compute_engines/base.py#L19).
 
 ### Guide
 
-The fastest way to add custom logic to Feast is to extend an existing materialization engine. The most generic engine is the `LocalMaterializationEngine` which contains no cloud-specific logic. The guide that follows will extend the `LocalProvider` with operations that print text to the console. It is up to you as a developer to add your custom code to the engine methods, but the guide below will provide the necessary scaffolding to get you started.
+The fastest way to add custom logic to Feast is to implement the ComputeEngine. The guide that follows will extend the `LocalProvider` with operations that print text to the console. It is up to you as a developer to add your custom code to the engine methods, but the guide below will provide the necessary scaffolding to get you started.
 
 #### Step 1: Define an Engine class
 
-The first step is to define a custom materialization engine class. We've created the `MyCustomEngine` below. This python file can be placed in your `feature_repo` directory if you're following the Quickstart guide.
+The first step is to define a custom compute engine class. We've created the `MyCustomEngine` below. This python file can be placed in your `feature_repo` directory if you're following the Quickstart guide.
 
 ```python
 from typing import List, Sequence, Union
@@ -27,14 +27,16 @@ from feast.entity import Entity
 from feast.feature_view import FeatureView
 from feast.batch_feature_view import BatchFeatureView
 from feast.stream_feature_view import StreamFeatureView
-from feast.infra.materialization.local_engine import LocalMaterializationJob, LocalMaterializationEngine
+from feast.infra.common.retrieval_task import HistoricalRetrievalTask
+from feast.infra.compute_engines.local.job import LocalMaterializationJob
+from feast.infra.compute_engines.base import ComputeEngine 
 from feast.infra.common.materialization_job import MaterializationTask
-from feast.infra.offline_stores.offline_store import OfflineStore
+from feast.infra.offline_stores.offline_store import OfflineStore, RetrievalJob
 from feast.infra.online_stores.online_store import OnlineStore
 from feast.repo_config import RepoConfig
 
 
-class MyCustomEngine(LocalMaterializationEngine):
+class MyCustomEngine(ComputeEngine):
     def __init__(
             self,
             *,
@@ -80,9 +82,13 @@ class MyCustomEngine(LocalMaterializationEngine):
             )
             for task in tasks
         ]
+
+    def get_historical_features(self, task: HistoricalRetrievalTask) -> RetrievalJob:
+        raise NotImplementedError
 ```
 
-Notice how in the above engine we have only overwritten two of the methods on the `LocalMaterializatinEngine`, namely `update` and `materialize`. These two methods are convenient to replace if you are planning to launch custom batch jobs.
+Notice how in the above engine we have only overwritten two of the methods on the `LocalComputeEngine`, namely `update` and `materialize`. These two methods are convenient to replace if you are planning to launch custom batch jobs.
+If you want to use the compute to execute the get_historical_features method, you will need to implement the `get_historical_features` method as well.
 
 #### Step 2: Configuring Feast to use the engine
 

@@ -79,7 +79,7 @@ def __init__(
         ttl: Optional[timedelta] = None,
         tags: Optional[Dict[str, str]] = None,
         online: bool = False,
-        offline: bool = True,
+        offline: bool = False,
         description: str = "",
         owner: str = "",
         schema: Optional[List[Field]] = None,

@@ -20,7 +20,8 @@ class MaterializationTask:
     feature_view: Union[BatchFeatureView, StreamFeatureView, FeatureView]
     start_time: datetime
     end_time: datetime
-    tqdm_builder: Callable[[int], tqdm]
+    only_latest: bool = True
+    tqdm_builder: Union[None, Callable[[int], tqdm]] = None
 
 
 class MaterializationJobStatus(enum.Enum):

@@ -3,13 +3,12 @@
 import logging
 from concurrent.futures import ThreadPoolExecutor, wait
 from dataclasses import dataclass
-from datetime import datetime
-from typing import Callable, List, Literal, Optional, Sequence, Union
+from typing import Literal, Optional, Sequence, Union
 
 import boto3
+import pyarrow as pa
 from botocore.config import Config
 from pydantic import StrictStr
-from tqdm import tqdm
 
 from feast import utils
 from feast.batch_feature_view import BatchFeatureView
@@ -21,9 +20,8 @@
     MaterializationJobStatus,
     MaterializationTask,
 )
-from feast.infra.materialization.batch_materialization_engine import (
-    BatchMaterializationEngine,
-)
+from feast.infra.common.retrieval_task import HistoricalRetrievalTask
+from feast.infra.compute_engines.base import ComputeEngine
 from feast.infra.offline_stores.offline_store import OfflineStore
 from feast.infra.online_stores.online_store import OnlineStore
 from feast.infra.registry.base_registry import BaseRegistry
@@ -40,8 +38,8 @@
 logger = logging.getLogger(__name__)
 
 
-class LambdaMaterializationEngineConfig(FeastConfigBaseModel):
-    """Batch Materialization Engine config for lambda based engine"""
+class LambdaComputeEngineConfig(FeastConfigBaseModel):
+    """Batch Compute Engine config for lambda based engine"""
 
     type: Literal["lambda"] = "lambda"
     """ Type selector"""
@@ -82,11 +80,18 @@ def url(self) -> Optional[str]:
         return None
 
 
-class LambdaMaterializationEngine(BatchMaterializationEngine):
+class LambdaComputeEngine(ComputeEngine):
     """
     WARNING: This engine should be considered "Alpha" functionality.
     """
 
+    def get_historical_features(
+        self, registry: BaseRegistry, task: HistoricalRetrievalTask
+    ) -> pa.Table:
+        raise NotImplementedError(
+            "Lambda Compute Engine does not support get_historical_features"
+        )
+
     def update(
         self,
         project: str,
@@ -160,30 +165,14 @@ def __init__(
         config = Config(read_timeout=DEFAULT_TIMEOUT + 10)
         self.lambda_client = boto3.client("lambda", config=config)
 
-    def materialize(
-        self, registry, tasks: List[MaterializationTask]
-    ) -> List[MaterializationJob]:
-        return [
-            self._materialize_one(
-                registry,
-                task.feature_view,
-                task.start_time,
-                task.end_time,
-                task.project,
-                task.tqdm_builder,
-            )
-            for task in tasks
-        ]
-
     def _materialize_one(
-        self,
-        registry: BaseRegistry,
-        feature_view: Union[BatchFeatureView, StreamFeatureView, FeatureView],
-        start_date: datetime,
-        end_date: datetime,
-        project: str,
-        tqdm_builder: Callable[[int], tqdm],
+        self, registry: BaseRegistry, task: MaterializationTask, **kwargs
     ):
+        feature_view = task.feature_view
+        start_date = task.start_time
+        end_date = task.end_time
+        project = task.project
+
         entities = []
         for entity_name in feature_view.entities:
             entities.append(registry.get_entity(entity_name, project))

@@ -1,63 +1,130 @@
-from abc import ABC
-from typing import Union
+from abc import ABC, abstractmethod
+from typing import List, Optional, Sequence, Union
 
 import pyarrow as pa
 
 from feast import RepoConfig
+from feast.batch_feature_view import BatchFeatureView
+from feast.entity import Entity
+from feast.feature_view import FeatureView
 from feast.infra.common.materialization_job import (
     MaterializationJob,
     MaterializationTask,
 )
 from feast.infra.common.retrieval_task import HistoricalRetrievalTask
 from feast.infra.compute_engines.dag.context import ColumnInfo, ExecutionContext
-from feast.infra.offline_stores.offline_store import OfflineStore
+from feast.infra.offline_stores.offline_store import OfflineStore, RetrievalJob
 from feast.infra.online_stores.online_store import OnlineStore
-from feast.infra.registry.registry import Registry
+from feast.infra.registry.base_registry import BaseRegistry
+from feast.on_demand_feature_view import OnDemandFeatureView
+from feast.stream_feature_view import StreamFeatureView
 from feast.utils import _get_column_names
 
 
 class ComputeEngine(ABC):
     """
-    The interface that Feast uses to control the compute system that handles materialization and get_historical_features.
+    The interface that Feast uses to control to compute system that handles materialization and get_historical_features.
     Each engine must implement:
         - materialize(): to generate and persist features
-        - get_historical_features(): to perform point-in-time correct joins
+        - get_historical_features(): to perform historical retrieval of features
     Engines should use FeatureBuilder and DAGNode abstractions to build modular, pluggable workflows.
     """
 
     def __init__(
         self,
         *,
-        registry: Registry,
         repo_config: RepoConfig,
         offline_store: OfflineStore,
         online_store: OnlineStore,
         **kwargs,
     ):
-        self.registry = registry
         self.repo_config = repo_config
         self.offline_store = offline_store
         self.online_store = online_store
 
-    def materialize(self, task: MaterializationTask) -> MaterializationJob:
-        raise NotImplementedError
+    @abstractmethod
+    def update(
+        self,
+        project: str,
+        views_to_delete: Sequence[
+            Union[BatchFeatureView, StreamFeatureView, FeatureView]
+        ],
+        views_to_keep: Sequence[
+            Union[BatchFeatureView, StreamFeatureView, FeatureView, OnDemandFeatureView]
+        ],
+        entities_to_delete: Sequence[Entity],
+        entities_to_keep: Sequence[Entity],
+    ):
+        """
+        Prepares cloud resources required for batch materialization for the specified set of Feast objects.
+
+        Args:
+            project: Feast project to which the objects belong.
+            views_to_delete: Feature views whose corresponding infrastructure should be deleted.
+            views_to_keep: Feature views whose corresponding infrastructure should not be deleted, and
+                may need to be updated.
+            entities_to_delete: Entities whose corresponding infrastructure should be deleted.
+            entities_to_keep: Entities whose corresponding infrastructure should not be deleted, and
+                may need to be updated.
+        """
+        pass
+
+    @abstractmethod
+    def teardown_infra(
+        self,
+        project: str,
+        fvs: Sequence[Union[BatchFeatureView, StreamFeatureView, FeatureView]],
+        entities: Sequence[Entity],
+    ):
+        """
+        Tears down all cloud resources used by the materialization engine for the specified set of Feast objects.
+
+        Args:
+            project: Feast project to which the objects belong.
+            fvs: Feature views whose corresponding infrastructure should be deleted.
+            entities: Entities whose corresponding infrastructure should be deleted.
+        """
+        pass
 
-    def get_historical_features(self, task: HistoricalRetrievalTask) -> pa.Table:
+    def materialize(
+        self,
+        registry: BaseRegistry,
+        tasks: Union[MaterializationTask, List[MaterializationTask]],
+        **kwargs,
+    ) -> List[MaterializationJob]:
+        if isinstance(tasks, MaterializationTask):
+            tasks = [tasks]
+        return [self._materialize_one(registry, task, **kwargs) for task in tasks]
+
+    def _materialize_one(
+        self,
+        registry: BaseRegistry,
+        task: MaterializationTask,
+        **kwargs,
+    ) -> MaterializationJob:
+        raise NotImplementedError(
+            "Materialization is not implemented for this compute engine."
+        )
+
+    def get_historical_features(
+        self, registry: BaseRegistry, task: HistoricalRetrievalTask
+    ) -> Union[RetrievalJob, pa.Table]:
         raise NotImplementedError
 
     def get_execution_context(
         self,
+        registry: BaseRegistry,
         task: Union[MaterializationTask, HistoricalRetrievalTask],
     ) -> ExecutionContext:
         entity_defs = [
-            self.registry.get_entity(name, task.project)
+            registry.get_entity(name, task.project)
             for name in task.feature_view.entities
         ]
         entity_df = None
         if hasattr(task, "entity_df") and task.entity_df is not None:
             entity_df = task.entity_df
 
-        column_info = self.get_column_info(task)
+        column_info = self.get_column_info(registry, task)
         return ExecutionContext(
             project=task.project,
             repo_config=self.repo_config,
@@ -70,14 +137,39 @@ def get_execution_context(
 
     def get_column_info(
         self,
+        registry: BaseRegistry,
         task: Union[MaterializationTask, HistoricalRetrievalTask],
     ) -> ColumnInfo:
+        entities = []
+        for entity_name in task.feature_view.entities:
+            entities.append(registry.get_entity(entity_name, task.project))
+
         join_keys, feature_cols, ts_col, created_ts_col = _get_column_names(
-            task.feature_view, self.registry.list_entities(task.project)
+            task.feature_view, entities
         )
+        field_mapping = self.get_field_mapping(task.feature_view)
+
         return ColumnInfo(
             join_keys=join_keys,
             feature_cols=feature_cols,
             ts_col=ts_col,
             created_ts_col=created_ts_col,
+            field_mapping=field_mapping,
         )
+
+    def get_field_mapping(
+        self, feature_view: Union[BatchFeatureView, StreamFeatureView, FeatureView]
+    ) -> Optional[dict]:
+        """
+        Get the field mapping for a feature view.
+        Args:
+            feature_view: The feature view to get the field mapping for.
+
+        Returns:
+            A dictionary mapping field names to column names.
+        """
+        if feature_view.stream_source:
+            return feature_view.stream_source.field_mapping
+        if feature_view.batch_source:
+            return feature_view.batch_source.field_mapping
+        return None