-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Adding blog on RAG with Milvus #5161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
franciscojavierarceo
merged 11 commits into
feast-dev:master
from
franciscojavierarceo:milvus-blog-post
Apr 3, 2025
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
946ffa3
feat: Adding blog on RAG with Milvus
franciscojavierarceo 13e4330
minor changes
franciscojavierarceo 65211f5
Adding diagram
franciscojavierarceo a9b2a1a
Rename image.png to milvus-rag.png
franciscojavierarceo 027cb4b
Update retrieval-augmentation-with-feast.md
franciscojavierarceo 6d7e1ac
Merge branch 'feast-dev:master' into milvus-blog-post
franciscojavierarceo 4685750
incorporating Willem's feedback
franciscojavierarceo ee53d42
adjust blog and image
franciscojavierarceo 7e1e3d2
updated copy
franciscojavierarceo ce1b114
adding github link
franciscojavierarceo 5de9019
finished blog post, good enough
franciscojavierarceo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,248 @@ | ||
| --- | ||
| title: Retrieval Augmented Generation with Feast | ||
| description: How Feast empowers ML Engineers to ship RAG applications to Production. | ||
| date: 2025-03-17 | ||
| authors: ["Francisco Javier Arceo"] | ||
| --- | ||
|
|
||
| <div class="hero-image"> | ||
| <img src="/images/blog/space.jpg" alt="Exploring the Possibilities of AI" loading="lazy"> | ||
| </div> | ||
|
|
||
|
|
||
| ## Why Feature Stores Make Sense for GenAI and RAG | ||
|
|
||
| Feature stores have been developed over the [past decade](./what-is-a-feature-store) to address the challenges AI | ||
| practitioners face in managing, serving, and scaling machine learning models in production. | ||
|
|
||
| Some of the key challenges include: | ||
| * Accessing the right raw data | ||
| * Building features from raw data | ||
| * Combining features into training data | ||
| * Calculating and serving features in production | ||
| * Monitoring features in production | ||
|
|
||
| And Feast was specifically designed to address these challenges. | ||
|
|
||
| These same challenges extend naturally to Generative AI (GenAI) applications, with the exception of model training. In | ||
| GenAI applications, the foundation model is typically pre-trained and the focus is on fine-tuning or using the model simply as | ||
| an endpoint from some provider (e.g., OpenAI, Anthropic, etc.). | ||
|
|
||
| For GenAI use cases, feature stores enable the efficient management of context and metadata, both during | ||
| training/fine-tuning and at inference time. | ||
|
|
||
| By using a feature store for your application, you have the ability to treat the LLM context, including the prompt, | ||
| as features. This means you can manage not only input context, document processing, data formatting, tokenization, | ||
| chunking, and embeddings, but also track and version the context used during model inference, ensuring consistency, | ||
| transparency, and reproducibility across models and iterations. | ||
|
|
||
| With Feast, ML engineers can streamline the embedding generation process, ensure consistency across both offline and | ||
| online environments, and track the lineage of data and transformations. By leveraging a feature store, GenAI | ||
| applications benefit from enhanced scalability, maintainability, and reproducibility, making them ideal for complex | ||
| AI applications and enterprise needs. | ||
|
|
||
| ## Feast Now Supports RAG | ||
|
|
||
| With the rise of generative AI applications, the need to serve vectors has grown quickly. Feast now has alpha support | ||
| for vector similarity search to power retrieval augmented generation (RAG) systems in production. | ||
|
|
||
| <div class="content-image"> | ||
| <img src="/images/blog/milvus-rag.png" alt="Retrieval Augmented Generation with Milvus and Feast" loading="lazy"> | ||
| </div> | ||
|
|
||
| This allows ML Engineers and Data Scientists to use the power of their feature store to easily deploy GenAI | ||
| applications using RAG to production. More importantly, Feast offers the flexibility to customize and scale your | ||
| production RAG applications through our scalable transformation systems (streaming, request-time, and batch). | ||
|
|
||
| ## Retrieval Augmented Generation (RAG) | ||
| [RAG](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) is a technique that combines generative models | ||
| (e.g., LLMs) with retrieval systems to generate contextually relevant output for a particular goal (e.g., | ||
| question and answering). | ||
|
|
||
| The typical RAG process involves: | ||
| 1. Sourcing text data relevant for your application | ||
| 2. Transforming each text document into smaller chunks of text | ||
| 3. Transforming those chunks of text into embeddings | ||
| 4. Inserting those chunks of text along with some identifier for the chunk and document in some database | ||
| 5. Retrieving those chunks of text along with the identifiers at run-time to inject that text into the LLM's context | ||
| 6. Calling some API to run inference with your LLM to generate contextually relevant output | ||
| 7. Returning the output to some end user | ||
|
|
||
| Implicit in (1)-(4) is the potential of scaling to large amounts of data (i.e., using some form of distributed computing), | ||
| orchestrating that scaling through some batch or streaming pipeline, and customization of key transformation decisions | ||
| (e.g., tokenization, model, chunking, data formatting, etc.). | ||
|
|
||
| ## Powering Retrieval in Production | ||
| To power the Retrieval step of RAG in production, we need to handle data ingestion, data transformation, indexing, | ||
| and serving web requests from an API. | ||
|
|
||
| Building high availability software that can handle these requirements and scale as your data scales is a | ||
| non-trivial task. This is a strength of Feast, using the power of Kubernetes, large scale data frameworks like | ||
| Spark and Flink, and the ability to ingest and transform data in real-time through the Feast Feature Server is | ||
| a powerful combination. | ||
|
|
||
| ## Beyond Vector Similarity Search | ||
| RAG patterns often use vector similarity search for the retrieval step, but this is not the | ||
| only retrieval pattern that can be useful. In fact, standard entity-based retrieval can be very powerful for | ||
| applications where relevant user-context is necessary. | ||
|
|
||
| For example, many RAG applications are customer Chat Bots and they benefit significantly from user data (e.g., | ||
| account balance, location, etc.) to generate contextually relevant output. Feast can help you manage this user data | ||
| using its existing entity based retrieval patterns. | ||
|
|
||
| ## The Benefits of Feast | ||
| Fine-tuning is the holy grail to optimize your RAG systems, and by logging the documents/data and context retrieved | ||
| and during inference, you can ensure that you can fine-tune both the generator and *the retriever* your LLMs for | ||
| your particular needs. | ||
|
|
||
| This means that Feast can help you not only serve your documents, user data, and other metadata for production | ||
| RAG applications, but it can also help you scale your embeddings on large amounts of data (e.g,. using Spark to embed | ||
| gigabytes of documents), re-use the same code online and offline, track changes to your transformations, data sources, and | ||
| RAG-sources to provide you with replayability and data lineage, and prepare your datasets so you can fine-tune your | ||
| embedding, retrieval, or generator models later. | ||
|
|
||
| Historically, Feast catered to Data Scientists and ML Engineers who implemented their own types of data/feature transformations but, now, | ||
| many RAG providers handle this out of the box for you. We will invest in creating extendable implementations to make it easier | ||
| to ship your applications. | ||
|
|
||
| ## Feast Powered by Milvus | ||
|
|
||
| [Milvus](https://milvus.io/) is a high performance open source vector database that provides a powerful and efficient way to store | ||
| and retrieve embeddings. By using Feast with Milvus, you can easily deploy RAG applications to production and scale | ||
| your retrieval systems on Kubernetes using the Feast Operator or the [Feature Server Helm Chart](https://github.com/feast-dev/feast/tree/master/infra/charts/feast-feature-server). | ||
|
|
||
| This tutorial will walk you through building a basic RAG application with Milvus and Feast; i.e., ingesting embedded | ||
| documents in Milvus and retrieving the most similar documents for a given query embedding. | ||
|
|
||
| This example consists of 5 steps: | ||
| 1. Configuring Milvus | ||
| 2. Defining your Data Sources and Views | ||
| 3. Updating your Registry | ||
| 4. Ingesting the Data | ||
| 5. Retrieving the Data | ||
|
|
||
| The full demo is available on our [GitHub repository](https://github.com/feast-dev/feast/tree/master/examples/rag). | ||
|
|
||
| ### Step 1: Configure Milvus | ||
| Configure milvus in a simple `yaml` file. | ||
| ```yaml | ||
| project: rag | ||
| provider: local | ||
| registry: data/registry.db | ||
| online_store: | ||
| type: milvus | ||
| path: data/online_store.db | ||
| vector_enabled: true | ||
| embedding_dim: 384 | ||
| index_type: "IVF_FLAT" | ||
|
|
||
| offline_store: | ||
| type: file | ||
| entity_key_serialization_version: 3 | ||
| # By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details. | ||
| auth: | ||
| type: no_auth | ||
| ``` | ||
|
|
||
| ### Step 2: Define your Data Sources and Views | ||
| You define your data declaratively using Feast's `FeatureView` and `Entity` objects, which are meant to be an easy way | ||
| to give your software engineers and data scientists a common language to define data they want to ship to production. | ||
|
|
||
| Here is an example of how you might define a `FeatureView` for a document retrieval. Notice how we define the `vector` | ||
| field and enable vector search by setting `vector_index=True` and the distance metric to `COSINE`. | ||
|
|
||
| That's it, the rest of the implementation is already handled for you by Feast and Milvus. | ||
|
|
||
| ```python | ||
| document = Entity( | ||
| name="document_id", | ||
| description="Document ID", | ||
| value_type=ValueType.INT64, | ||
| ) | ||
|
|
||
| source = FileSource( | ||
| file_format=ParquetFormat(), | ||
| path="./data/my_data.parquet", | ||
| timestamp_field="event_timestamp", | ||
| ) | ||
|
|
||
| # Define the view for retrieval | ||
| city_embeddings_feature_view = FeatureView( | ||
| name="city_embeddings", | ||
| entities=[document], | ||
| schema=[ | ||
| Field( | ||
| name="vector", | ||
| dtype=Array(Float32), | ||
| vector_index=True, # Vector search enabled | ||
| vector_search_metric="COSINE", # Distance metric configured | ||
| ), | ||
| Field(name="state", dtype=String), | ||
| Field(name="sentence_chunks", dtype=String), | ||
| Field(name="wiki_summary", dtype=String), | ||
| ], | ||
| source=source, | ||
| ttl=timedelta(hours=2), | ||
| ) | ||
| ``` | ||
|
|
||
| ### Step 3: Update your Registry | ||
| After we have defined our code we use the `feast apply` syntax in the same folder as the `feature_store.yaml` file and | ||
| update the registry with our metadata. | ||
| ```bash | ||
| feast apply | ||
| ``` | ||
|
|
||
| ### Step 4: Ingest your Data | ||
| Now that we have defined our metadata, we can ingest our data into Milvus using the following code: | ||
| ```python | ||
| store.write_to_online_store(feature_view_name='city_embeddings', df=df) | ||
| ``` | ||
|
|
||
| ### Step 5: Retrieve your Data | ||
| Now that the data is actually stored in Milvus, we can easily query it using the SDK (and corresponding REST API) to | ||
| retrieve the most similar documents for a given query embedding. | ||
| ```python | ||
| context_data = store.retrieve_online_documents_v2( | ||
| features=[ | ||
| "city_embeddings:vector", | ||
| "city_embeddings:document_id", | ||
| "city_embeddings:state", | ||
| "city_embeddings:sentence_chunks", | ||
| "city_embeddings:wiki_summary", | ||
| ], | ||
| query=query_embedding, | ||
| top_k=3, | ||
| distance_metric='COSINE', | ||
| ).to_df() | ||
| ``` | ||
|
|
||
| ### The Benefits from using Feast for RAG | ||
| We've discussed some of the high-level benefits from using Feast for a RAG application. | ||
| More specifically, here are some of the concrete benefits you can expect from using Feast for RAG: | ||
| 1. [Real-time, Stream, and Batch data Ingestion](https://docs.feast.dev/getting-started/concepts/data-ingestion) support to the Feature Server for online retrieval | ||
| 1. [Data dictionary/metadata catalog](https://docs.feast.dev/getting-started/components/registry) autogenerated from code | ||
| 3. [UI exposing the metadata catalog](https://docs.feast.dev/reference/alpha-web-ui) | ||
| 2. [FastAPI Server](https://docs.feast.dev/getting-started/components/feature-server) to serve your data | ||
| 3. [Role Based Access Control (RBAC)](https://docs.feast.dev/getting-started/concepts/permission) to govern access to your data | ||
| 6. [Deployment on Kubernetes](https://docs.feast.dev/how-to-guides/running-feast-in-production) using our Helm Chart or our Operator | ||
| 7. [Multiple vector database providers](https://docs.feast.dev/reference/alpha-vector-database) | ||
| 8. [Multiple data warehouse providers](https://docs.feast.dev/reference/offline-stores/overview#functionality-matrix) | ||
| 9. Support for different [data sources](https://docs.feast.dev/reference/data-sources/overview#functionality-matrix) | ||
| 10. Support for stream and [batch processors (e.g., Spark and Flink)](https://docs.feast.dev/tutorials/building-streaming-features) | ||
|
|
||
| And more! | ||
|
|
||
| ## The Future of Feast and GenAI | ||
|
|
||
| Feast will continue to invest in GenAI use cases. | ||
|
|
||
| In particular, we will invest in (1) NLP as a first-class citizen, (2) support for images, (3) support for | ||
| transforming unstructured data (e.g., PDFs), (4) an enhanced GenAI focused feature server to allow our end-users to | ||
| more easily ship RAG to production, (4) an out of the box chat UI meant for internal development and fast iteration, | ||
| and (5) making [Milvus]([url](https://milvus.io/intro)) a fully supported and core online store for RAG. | ||
|
|
||
| ## Join the Conversation | ||
|
|
||
| Are you interested in learning more about how Feast can help you build and deploy RAG applications to production? | ||
| Reach out to us on Slack or [GitHub](https://github.com/feast-dev/feast), we'd love to hear from you! | ||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!