feast-dev · patelchaitany · Mar 17, 2026 · Mar 24, 2026 · May 19, 2026 · franciscojavierarceo
@@ -37,6 +37,8 @@ The Feature Server operates as a stateless service backed by two key components:
 | `/push`                      | Pushes feature data to the online and/or offline store.                 |
 | `/materialize`               | Materializes features within a specific time range to the online store. |
 | `/materialize-incremental`   | Incrementally materializes features up to the current timestamp.        |
-| `/retrieve-online-documents` | Supports Vector Similarity Search for RAG (Alpha end-ponit)             |
+| `/search` | Vector similarity search for RAG (Alpha endpoint) |
+| `/v1/vector_stores/{id}/search` | OpenAI-compatible vector search with server-side embedding |
+| `/retrieve-online-documents` | **Deprecated.** Use `/search` instead. |
 | `/docs`                      | API Contract for available endpoints                                    | 
 
@@ -225,7 +225,8 @@ The MCP integration uses the `fastapi_mcp` library to automatically transform yo
 The fastapi_mcp integration automatically exposes your Feast feature server's FastAPI endpoints as MCP tools. This means AI assistants can:
 
 * **Call `/get-online-features`** to retrieve features from the feature store
-* **Call `/retrieve-online-documents`** to perform vector similarity search
+* **Call `/search`** to perform vector similarity search (`/retrieve-online-documents` is a deprecated alias)
+* **Call `/v1/vector_stores/{feature_view}/search`** for OpenAI-compatible text search with server-side embedding
 * **Call `/write-to-online-store`** to persist agent state (memory, notes, interaction history)
 * **Use `/health`** to check server status  
 

@@ -32,6 +32,109 @@ backwards compatibility and the adopt industry standard naming conventions.
 
 **Note**: Milvus and SQLite implement the v2 `retrieve_online_documents_v2` method in the SDK. This will be the longer-term solution so that Data Scientists can easily enable vector similarity search by just flipping a flag.
 
+## Feature server search endpoints
+
+| Endpoint | Use when |
+|----------|----------|
+| `POST /search` | You have an embedding vector (or use `api_version: 2` with `query_string`) and want Feast's native online-features response format. |
+| `POST /v1/vector_stores/{feature_view}/search` | You want plain-text queries with server-side embedding and an OpenAI-compatible response. |
+
+`POST /retrieve-online-documents` is deprecated; use `POST /search` instead.
+
+## OpenAI-Compatible Vector Store Search
+
+Feast exposes an OpenAI-compatible vector store search endpoint at `POST /v1/vector_stores/{feature_view}/search`. This endpoint accepts plain text queries, handles embedding server-side, and returns results in the [OpenAI Vector Store Search API](https://platform.openai.com/docs/api-reference/vector-stores-search) format.
+
+This is useful for AI agents and LLM tool-calling workflows where the client cannot produce raw embedding vectors.
+
+### Requirements
+
+- An `embedding_model` section in your `feature_store.yaml` (uses [LiteLLM](https://docs.litellm.ai/) for provider support):
+
+```yaml
+embedding_model:
+  model: text-embedding-3-small
+  api_key: ${OPENAI_API_KEY}
+  # api_base, api_version, dimensions are optional
+```
+
+- A feature view with `vector_index=True` on a vector field, materialized to an online store that supports vector search.
+- For metadata filtering with numeric or boolean comparisons, set `enable_openai_compatible_store: true` on your online store config and run `feast apply` to update the schema.
+
+### Usage
+
+Start the feature server with `feast serve`, then send a search request:
+
+```bash
+curl -X POST http://localhost:6566/v1/vector_stores/my_feature_view/search \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "wireless noise-cancelling headphones",
+    "max_num_results": 5
+  }'
+```
+
+### Request fields
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `query` | `string` or `list[string]` | (required) | Plain text search query. Lists are joined with spaces before embedding. |
+| `max_num_results` | `int` | `10` | Maximum number of results to return. |
+| `filters` | `object` | `null` | OpenAI-style filters (see below). |
+| `ranking_options` | `object` | `null` | Accepted but not yet applied. |
+| `rewrite_query` | `bool` | `null` | Accepted but not yet applied. |
+| `metadata` | `object` | `null` | Optional. `metadata.features_to_retrieve` selects specific features. |
+
+### Filters
+
+The endpoint supports OpenAI-style filters for narrowing results beyond vector similarity.
+
+**Comparison operators:** `eq`, `ne`, `gt`, `gte`, `lt`, `lte`, `in`, `nin`
+
+```json
+{"type": "eq", "key": "category", "value": "Electronics"}
+```
+
+**Compound operators:** `and`, `or` (nest to arbitrary depth)
+
+```json
+{
+  "type": "and",
+  "filters": [
+    {"type": "eq", "key": "category", "value": "Electronics"},
+    {"type": "gte", "key": "rating", "value": 4.5}
+  ]
+}
+```
+
+String equality filters work on all backends. Numeric and boolean filters require `enable_openai_compatible_store: true` in the online store config.
+
+### Response format
+
+Responses follow the OpenAI `vector_store.search_results.page` schema:
+
+```json
+{
+  "object": "vector_store.search_results.page",
+  "search_query": ["wireless noise-cancelling headphones"],
+  "data": [
+    {
+      "file_id": "my_feature_view_42",
+      "filename": "my_feature_view",
+      "score": 0.92,
+      "attributes": {"name": "...", "category": "..."},
+      "content": [
+        {"type": "text", "text": "..."}
+      ]
+    }
+  ],
+  "has_more": false,
+  "next_page": null
+}
+```
+
+Pagination is not yet implemented; `has_more` is always `false`.
+
 ## Examples
 
 - See the v0 [Rag Demo](https://github.com/feast-dev/feast-workshop/blob/rag/module_4_rag) for an example on how to use vector database using the `retrieve_online_documents` method (planning migration and deprecation (planning migration and deprecation).

@@ -527,6 +527,72 @@ Prometheus adds an `instance` label per pod, so there is no
 duplication.  Use `sum(rate(...))` or `histogram_quantile(...)` across
 instances as usual.
 
+## Vector Search (`POST /search`)
+
+The feature server exposes `POST /search` for vector similarity search against online document embeddings. Pass a pre-computed embedding in `query`, or use `api_version: 2` with `query_string` for text-based search when the online store supports it.
+
+`POST /retrieve-online-documents` is a deprecated alias with the same request body and response; new integrations should use `/search`.
+
+## OpenAI-Compatible Vector Store Search
+
+The feature server exposes an OpenAI-compatible vector store search endpoint. This allows clients (including LLM agents and tool-calling frameworks) to search vector data with plain text queries, without computing embeddings client-side.
+
+### Endpoint
+
+`POST /v1/vector_stores/{vector_store_id}/search`
+
+The `vector_store_id` path parameter is the **feature view name**.
+
+### Configuration
+
+Add an `embedding_model` section to your `feature_store.yaml`:
+
+```yaml
+embedding_model:
+  model: text-embedding-3-small
+  api_key: ${OPENAI_API_KEY}
+```
+
+Any [LiteLLM](https://docs.litellm.ai/)-supported provider works (OpenAI, Ollama, Azure, Cohere, etc.). See [Alpha Vector Database](../alpha-vector-database.md#openai-compatible-vector-store-search) for full configuration and filter details.
+
+### Example
+
+```bash
+curl -X POST http://localhost:6566/v1/vector_stores/product_catalog/search \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "wireless noise-cancelling headphones",
+    "max_num_results": 5,
+    "filters": {
+      "type": "eq",
+      "key": "category",
+      "value": "Electronics"
+    }
+  }'
+```
+
+The response follows the OpenAI `vector_store.search_results.page` format:
+
+```json
+{
+  "object": "vector_store.search_results.page",
+  "search_query": ["wireless noise-cancelling headphones"],
+  "data": [
+    {
+      "file_id": "product_catalog_42",
+      "filename": "product_catalog",
+      "score": 0.92,
+      "attributes": {"name": "Sony WH-1000XM5", "category": "Electronics"},
+      "content": [{"type": "text", "text": "Sony WH-1000XM5"}]
+    }
+  ],
+  "has_more": false,
+  "next_page": null
+}
+```
+
+For metadata filtering with numeric comparisons, set `enable_openai_compatible_store: true` on your online store config and run `feast apply`.
+
 ## Starting the feature server in TLS(SSL) mode
 
 Enabling TLS mode ensures that data between the Feast client and server is transmitted securely. For an ideal production environment, it is recommended to start the feature server in TLS mode.
@@ -598,7 +664,9 @@ The [PyTorch NLP template](https://github.com/feast-dev/feast/tree/main/sdk/pyth
 | Endpoint                   | Resource Type                   | Permission                                            | Description                                                    |
 |----------------------------|---------------------------------|-------------------------------------------------------|----------------------------------------------------------------|
 | /get-online-features       | FeatureView,OnDemandFeatureView | Read Online                                           | Get online features from the feature store                     |
-| /retrieve-online-documents | FeatureView                     | Read Online                                           | Retrieve online documents from the feature store for RAG       |
+| /search | FeatureView                     | Read Online                                           | Vector similarity search for RAG (embedding vector or text query) |
+| /retrieve-online-documents | FeatureView              | Read Online                                           | **Deprecated.** Use `/search` instead.                           |
+| /v1/vector_stores/{id}/search | FeatureView                  | Read Online                                           | OpenAI-compatible vector search with server-side embedding     |
 | /push                      | FeatureView                     | Write Online, Write Offline, Write Online and Offline | Push features to the feature store (online, offline, or both)  |
 | /write-to-online-store     | FeatureView                     | Write Online                                          | Write features to the online store                             |
 | /materialize               | FeatureView                     | Write Online                                          | Materialize features within a specified time range             |

@@ -51,7 +51,7 @@ feature_server:
   mcp_server_version: "1.0.0"
 ```
 
-Once enabled, any MCP-compatible agent -- whether built with LangChain, LlamaIndex, CrewAI, AutoGen, or a custom framework -- can connect to `http://your-feast-server/mcp` and discover available tools like `get-online-features` for entity-based retrieval, `retrieve-online-documents` for vector similarity search, and `write-to-online-store` for persisting agent state.
+Once enabled, any MCP-compatible agent -- whether built with LangChain, LlamaIndex, CrewAI, AutoGen, or a custom framework -- can connect to `http://your-feast-server/mcp` and discover available tools like `get-online-features` for entity-based retrieval, `search` for vector similarity search, `vector_store_search` for OpenAI-compatible text search, and `write-to-online-store` for persisting agent state.
 
 ## A Concrete Example: Customer-Support Agent with Memory