Skip to content

Commit 160cd36

Browse files
Update model-inference.md
1 parent 23c6c86 commit 160cd36

File tree

1 file changed

+13
-4
lines changed

1 file changed

+13
-4
lines changed

docs/getting-started/architecture/model-inference.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Production machine learning systems can choose from four approaches to serving machine learning predictions (the output
44
of model inference):
55
1. Online model inference with online features
6-
2. Precomputed (batch) model predictions without online features
6+
2. Offline mode inference without online features
77
3. Online model inference with online features and cached predictions
88
4. Online model inference without features
99

@@ -27,10 +27,13 @@ features = store.get_online_features(
2727
model_predictions = model_server.predict(features)
2828
```
2929

30-
## 2. Precomputed (Batch) Model Predictions without Online Features
30+
## 2. Offline Model Inference without Online Features
3131
Typically, Machine Learning teams find serving precomputed model predictions to be the most straightforward to implement.
3232
This approach simply treats the model predictions as a feature and serves them from the feature store using the standard
33-
Feast sdk.
33+
Feast sdk. These model predictions are typically generated through some batch process where the model scores are precomputed.
34+
As a concrete example, the batch process can be as simple as a script that runs model inference locally for a set of users that
35+
can output a CSV. This output file could be used for materialization so that the model could be served online as shown in the
36+
code below.
3437
```python
3538
model_predictions = store.get_online_features(
3639
feature_refs=[
@@ -85,4 +88,10 @@ approach is common in Large Language Models (LLMs) and other models that do not
8588

8689
Note that generative models using Retrieval Augmented Generation (RAG) do require features where the
8790
[document embeddings](../../reference/alpha-vector-database.md) are treated as features, which Feast supports
88-
(this would fall under "Online Model Inference with Online Features").
91+
(this would fall under "Online Model Inference with Online Features").
92+
93+
### Client Orchestration
94+
Implicit in the code examples above is a design choice about how clients orchestrate calls to get features and run model inference.
95+
The examples had a Feast-centric pattern because they are inputs to the model, so the sequencing is fairly obvious.
96+
An alternative approach can be Inference-centric where a client would call an inference endpoint and the inference
97+
service would be responsible for orchestration.

0 commit comments

Comments
 (0)