Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
* [\[Alpha\] Saved dataset](getting-started/concepts/dataset.md)
* [Permission](getting-started/concepts/permission.md)
* [Tags](getting-started/concepts/tags.md)
* [Use Cases](getting-started/use-cases.md)
* [Components](getting-started/components/README.md)
* [Overview](getting-started/components/overview.md)
* [Registry](getting-started/components/registry.md)
Expand All @@ -50,6 +51,7 @@
* [Driver stats on Snowflake](tutorials/tutorials-overview/driver-stats-on-snowflake.md)
* [Validating historical features with Great Expectations](tutorials/validating-historical-features.md)
* [Building streaming features](tutorials/building-streaming-features.md)
* [Retrieval Augmented Generation (RAG) with Feast](tutorials/rag-with-docling.md)

## How-to Guides

Expand Down
117 changes: 117 additions & 0 deletions docs/getting-started/use-cases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Use Cases

This page covers common use cases for Feast and how a feature store can benefit your AI/ML workflows.

## Recommendation Engines

Recommendation engines require personalized feature data related to users, items, and their interactions. Feast can help by:

- **Managing feature data**: Store and serve user preferences, item characteristics, and interaction history
- **Low-latency serving**: Provide real-time features for dynamic recommendations
- **Point-in-time correctness**: Ensure training and serving data are consistent to avoid data leakage
- **Feature reuse**: Allow different recommendation models to share the same feature definitions

### Example: User-Item Recommendations

A typical recommendation engine might need features such as:
- User features: demographics, preferences, historical behavior
- Item features: categories, attributes, popularity scores
- Interaction features: past user-item interactions, ratings

Feast allows you to define these features once and reuse them across different recommendation models, ensuring consistency between training and serving environments.

{% content-ref url="../tutorials/tutorials-overview/driver-ranking-with-feast.md" %}
[Driver Ranking Tutorial](../tutorials/tutorials-overview/driver-ranking-with-feast.md)
{% endcontent-ref %}

## Risk Scorecards

Risk scorecards (such as credit risk, fraud risk, and marketing propensity models) require a comprehensive view of entity data with historical contexts. Feast helps by:

- **Feature consistency**: Ensure all models use the same feature definitions
- **Historical feature retrieval**: Generate training datasets with correct point-in-time feature values
- **Feature monitoring**: Track feature distributions to detect data drift
- **Governance**: Maintain an audit trail of features used in regulated environments

### Example: Credit Risk Scoring

Credit risk models might use features like:
- Transaction history patterns
- Account age and status
- Payment history features
- External credit bureau data
- Employment and income verification

Feast enables you to combine these features from disparate sources while maintaining data consistency and freshness.

{% content-ref url="../tutorials/tutorials-overview/real-time-credit-scoring-on-aws.md" %}
[Real-time Credit Scoring on AWS](../tutorials/tutorials-overview/real-time-credit-scoring-on-aws.md)
{% endcontent-ref %}

{% content-ref url="../tutorials/tutorials-overview/fraud-detection.md" %}
[Fraud Detection on GCP](../tutorials/tutorials-overview/fraud-detection.md)
{% endcontent-ref %}

## NLP / RAG / Information Retrieval

Natural Language Processing (NLP) and Retrieval Augmented Generation (RAG) applications require efficient storage and retrieval of text embeddings. Feast supports these use cases by:

- **Vector storage**: Store and index embedding vectors for efficient similarity search
- **Document metadata**: Associate embeddings with metadata for contextualized retrieval
- **Scaling retrieval**: Serve vectors with low latency for real-time applications
- **Versioning**: Track changes to embedding models and document collections

### Example: Retrieval Augmented Generation

RAG systems can leverage Feast to:
- Store document embeddings and chunks in a vector database
- Retrieve contextually relevant documents for user queries
- Combine document retrieval with entity-specific features
- Scale to large document collections

Feast makes it remarkably easy to make data available for retrieval by providing a simple API for both storing and querying vector embeddings.

{% content-ref url="../tutorials/rag-with-docling.md" %}
[RAG with Feast Tutorial](../tutorials/rag-with-docling.md)
{% endcontent-ref %}

## Time Series Forecasting

Time series forecasting for demand planning, inventory management, and anomaly detection benefits from Feast through:

- **Temporal feature management**: Store and retrieve time-bound features
- **Feature engineering**: Create time-based aggregations and transformations
- **Consistent feature retrieval**: Ensure training and inference use the same feature definitions
- **Backfilling capabilities**: Generate historical features for model training

### Example: Demand Forecasting

Demand forecasting applications typically use features such as:
- Historical sales data with temporal patterns
- Seasonal indicators and holiday flags
- Weather data
- Price changes and promotions
- External economic indicators

Feast allows you to combine these diverse data sources and make them available for both batch training and online inference.

## Image and Multi-Modal Processing

While Feast was initially built for structured data, it can also support multi-modal applications by:

- **Storing feature metadata**: Keep track of image paths, embeddings, and metadata
- **Vector embeddings**: Store image embeddings for similarity search
- **Feature fusion**: Combine image features with structured data features

## Why Feast Is Impactful

Across all these use cases, Feast provides several core benefits:

1. **Consistency between training and serving**: Eliminate training-serving skew by using the same feature definitions
2. **Feature reuse**: Define features once and use them across multiple models
3. **Scalable feature serving**: Serve features at low latency for production applications
4. **Feature governance**: Maintain a central registry of feature definitions with metadata
5. **Data freshness**: Keep online features up-to-date with batch and streaming ingestion
6. **Reduced operational complexity**: Standardize feature access patterns across models

By implementing a feature store with Feast, teams can focus on model development rather than data engineering challenges, accelerating the delivery of ML applications to production.
236 changes: 236 additions & 0 deletions docs/tutorials/rag-with-docling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
# Retrieval Augmented Generation (RAG) with Feast

This tutorial demonstrates how to use Feast with [Docling](https://github.com/doclingjs/docling) and [Milvus](https://milvus.io/) to build a Retrieval Augmented Generation (RAG) application. You'll learn how to store document embeddings in Feast and retrieve the most relevant documents for a given query.

## Overview

RAG is a technique that combines generative models (e.g., LLMs) with retrieval systems to generate contextually relevant output for a particular goal (e.g., question and answering). Feast makes it easy to store and retrieve document embeddings for RAG applications by providing integrations with vector databases like Milvus.

The typical RAG process involves:
1. Sourcing text data relevant for your application
2. Transforming each text document into smaller chunks of text
3. Transforming those chunks of text into embeddings
4. Inserting those chunks of text along with some identifier for the chunk and document in a database
5. Retrieving those chunks of text along with the identifiers at run-time to inject that text into the LLM's context
6. Calling some API to run inference with your LLM to generate contextually relevant output
7. Returning the output to some end user

## Prerequisites

- Python 3.10 or later
- Feast installed with Milvus support: `pip install feast[milvus]`
- A basic understanding of feature stores and vector embeddings

## Step 1: Configure Milvus in Feast

Create a `feature_store.yaml` file with the following configuration:

```yaml
project: rag
provider: local
registry: data/registry.db
online_store:
type: milvus
path: data/online_store.db
vector_enabled: true
embedding_dim: 384
index_type: "IVF_FLAT"

offline_store:
type: file
entity_key_serialization_version: 3
# By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details.
auth:
type: no_auth
```

## Step 2: Define your Data Sources and Views

Create a `feature_repo.py` file to define your entities, data sources, and feature views:

```python
from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Array, Float32, Int64, String, UnixTimestamp, ValueType

# Define entities
document = Entity(
name="document_id",
description="Document ID",
value_type=ValueType.INT64,
)

# Define data source
source = FileSource(
path="data/embedded_documents.parquet",
timestamp_field="event_timestamp",
created_timestamp_column="created_timestamp",
)

# Define the view for retrieval
document_embeddings = FeatureView(
name="embedded_documents",
entities=[document],
schema=[
Field(
name="vector",
dtype=Array(Float32),
vector_index=True, # Vector search enabled
vector_search_metric="COSINE", # Distance metric configured
),
Field(name="document_id", dtype=Int64),
Field(name="created_timestamp", dtype=UnixTimestamp),
Field(name="sentence_chunks", dtype=String),
Field(name="event_timestamp", dtype=UnixTimestamp),
],
source=source,
ttl=timedelta(hours=24),
)
```

## Step 3: Update your Registry

Apply the feature view definitions to the registry:

```bash
feast apply
```

## Step 4: Ingest your Data

Process your documents, generate embeddings, and ingest them into the Feast online store:

```python
from feast import FeatureStore
import pandas as pd
import numpy as np
from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F

# Initialize FeatureStore
store = FeatureStore(".")

# Function to generate embeddings
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
input_mask_expanded.sum(1), min=1e-9
)

def generate_embeddings(sentences, tokenizer, model):
encoded_input = tokenizer(
sentences, padding=True, truncation=True, return_tensors="pt"
)
with torch.no_grad():
model_output = model(**encoded_input)
sentence_embeddings = mean_pooling(model_output, encoded_input["attention_mask"])
sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
return sentence_embeddings.detach().cpu().numpy()

# Example data
data = {
"document_id": [1, 2, 3],
"sentence_chunks": [
"New York City is the most populous city in the United States.",
"Los Angeles is the second most populous city in the United States.",
"Chicago is the third most populous city in the United States."
],
"event_timestamp": pd.to_datetime(["2023-01-01", "2023-01-01", "2023-01-01"]),
"created_timestamp": pd.to_datetime(["2023-01-01", "2023-01-01", "2023-01-01"])
}

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

# Generate embeddings
embeddings = generate_embeddings(data["sentence_chunks"], tokenizer, model)

# Create DataFrame with embeddings
df = pd.DataFrame(data)
df["vector"] = embeddings.tolist()

# Write to online store
store.write_to_online_store(feature_view_name='embedded_documents', df=df)
```

## Step 5: Retrieve Relevant Documents

Now you can retrieve the most relevant documents for a given query:

```python
from feast import FeatureStore

# Initialize FeatureStore
store = FeatureStore(".")

# Generate query embedding
query = "What is the largest city in the US?"
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
query_embedding = generate_embeddings([query], tokenizer, model)[0].tolist()

# Retrieve similar documents
context_data = store.retrieve_online_documents_v2(
features=[
"embedded_documents:vector",
"embedded_documents:document_id",
"embedded_documents:sentence_chunks",
],
query=query_embedding,
top_k=3,
distance_metric='COSINE',
).to_df()

print(context_data)
```

## Step 6: Use Retrieved Documents for Generation

Finally, you can use the retrieved documents as context for an LLM:

```python
from openai import OpenAI
import os

client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY"),
)

# Format documents for context
def format_documents(context_data, base_prompt):
documents = "\n".join([f"Document {i+1}: {row['embedded_documents__sentence_chunks']}"
for i, row in context_data.iterrows()])
return f"{base_prompt}\n\nContext documents:\n{documents}"

BASE_PROMPT = """You are a helpful assistant that answers questions based on the provided context."""
FULL_PROMPT = format_documents(context_data, BASE_PROMPT)

# Generate response
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": FULL_PROMPT},
{"role": "user", "content": query}
],
)

print(response.choices[0].message.content)
```

## Why Feast for RAG?

Feast makes it remarkably easy to set up and manage a RAG system by:

1. Simplifying vector database configuration and management
2. Providing a consistent API for both writing and reading embeddings
3. Supporting both batch and real-time data ingestion
4. Enabling versioning and governance of your document repository
5. Offering seamless integration with multiple vector database backends
6. Providing a unified API for managing both feature data and document embeddings

For more details on using vector databases with Feast, see the [Vector Database documentation](../reference/alpha-vector-database.md).

The complete demo code is available in the [GitHub repository](https://github.com/feast-dev/feast/tree/master/examples/rag-docling).