MongoDB data sources are MongoDB collections that can be used as a source for feature data. The MongoDBSource points at a MongoDB collection and provides the metadata Feast needs to read historical features from the offline store's collection.
Defining a MongoDB source:
from feast.infra.offline_stores.contrib.mongodb_offline_store.mongodb import (
MongoDBSource,
)
driver_stats_source = MongoDBSource(
name="driver_stats",
timestamp_field="event_timestamp",
created_timestamp_column="created_at",
)The name field becomes the feature_view discriminator stored in every document in the feature_history collection.
Configuration options such as connection_string, database, and collection are inherited from the offline store configuration in feature_store.yaml.
The full set of configuration options is available here.
The MongoDB online store supports MongoDB Vector Search, enabling similarity search over feature embeddings stored in MongoDB. This is powered by the $vectorSearch aggregation stage and supports MongoDB Atlas, self-hosted MongoDB with Atlas Search indexes, and the mongodb/mongodb-atlas-local Docker image for local development.
Enable vector search in your feature_store.yaml:
project: my_project
provider: local
online_store:
type: mongodb
connection_string: mongodb+srv://<user>:<pass>@cluster.mongodb.net # pragma: allowlist secret
vector_enabled: true
similarity: cosine # cosine | euclidean | dotProduct
vector_index_wait_timeout: 60 # seconds to wait for index to become queryable
vector_index_wait_poll_interval: 1.0 # seconds between pollsMark embedding fields with vector_index=True and specify vector_length:
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Array, Float32, Int64, String
from datetime import timedelta
item_embeddings = FeatureView(
name="item_embeddings",
entities=[Entity(name="item_id", join_keys=["item_id"])],
schema=[
Field(
name="embedding",
dtype=Array(Float32),
vector_index=True,
vector_length=384,
vector_search_metric="cosine",
),
Field(name="title", dtype=String),
Field(name="item_id", dtype=Int64),
],
source=FileSource(path="items.parquet", timestamp_field="event_timestamp"),
ttl=timedelta(hours=24),
)When feast apply (or store.update()) runs with vector_enabled=True, MongoDB vector search indexes are automatically created for any field with vector_index=True. Indexes are also automatically dropped when feature views are removed.
Use retrieve_online_documents_v2() to perform similarity search:
store = FeatureStore(repo_path=".")
results = store.retrieve_online_documents_v2(
features=["item_embeddings:embedding", "item_embeddings:title"],
query=[0.1, 0.2, ...], # query vector
top_k=5,
)- Index creation:
update()creates a MongoDB vector search index named<feature_view>__<field>__vs_indexfor each vector-indexed field. It waits for the index to reachREADYstatus before proceeding. - Query execution:
retrieve_online_documents_v2()builds a$vectorSearchaggregation pipeline withnumCandidates = max(top_k * 10, 100)and the specifiedlimit. - Score: Results include a
distancefield populated from$meta: "vectorSearchScore". - BSON compatibility: Query vectors are coerced to native Python floats to avoid numpy serialization issues.
- Idempotency: Calling
update()multiple times will not duplicate indexes.
MongoDB data sources support all eight primitive types (bytes, string, int32, int64, float32, float64, bool, timestamp) and their corresponding array types. Complex types such as Map and Struct are preserved through the MongoDB document model.
For a comparison against other batch data sources, please see here.