33Production machine learning systems can choose from four approaches to serving machine learning predictions (the output
44of model inference):
551 . Online model inference with online features
6- 2 . Precomputed (batch) model predictions without online features
6+ 2 . Offline mode inference without online features
773 . Online model inference with online features and cached predictions
884 . Online model inference without features
99
@@ -27,10 +27,13 @@ features = store.get_online_features(
2727model_predictions = model_server.predict(features)
2828```
2929
30- ## 2. Precomputed (Batch) Model Predictions without Online Features
30+ ## 2. Offline Model Inference without Online Features
3131Typically, Machine Learning teams find serving precomputed model predictions to be the most straightforward to implement.
3232This approach simply treats the model predictions as a feature and serves them from the feature store using the standard
33- Feast sdk.
33+ Feast sdk. These model predictions are typically generated through some batch process where the model scores are precomputed.
34+ As a concrete example, the batch process can be as simple as a script that runs model inference locally for a set of users that
35+ can output a CSV. This output file could be used for materialization so that the model could be served online as shown in the
36+ code below.
3437``` python
3538model_predictions = store.get_online_features(
3639 feature_refs = [
@@ -85,4 +88,10 @@ approach is common in Large Language Models (LLMs) and other models that do not
8588
8689Note that generative models using Retrieval Augmented Generation (RAG) do require features where the
8790[ document embeddings] ( ../../reference/alpha-vector-database.md ) are treated as features, which Feast supports
88- (this would fall under "Online Model Inference with Online Features").
91+ (this would fall under "Online Model Inference with Online Features").
92+
93+ ### Client Orchestration
94+ Implicit in the code examples above is a design choice about how clients orchestrate calls to get features and run model inference.
95+ The examples had a Feast-centric pattern because they are inputs to the model, so the sequencing is fairly obvious.
96+ An alternative approach can be Inference-centric where a client would call an inference endpoint and the inference
97+ service would be responsible for orchestration.
0 commit comments