In-memory vector store implementation.
Uses a dictionary, and computes cosine similarity for search using numpy.
InMemoryVectorStore(
self,
embedding: Embeddings,
)Setup:
Install langchain-core.
pip install -U langchain-core
Key init args — indexing params:
Instantiate:
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings
vector_store = InMemoryVectorStore(OpenAIEmbeddings())
Add Documents:
from langchain_core.documents import Document
document_1 = Document(id="1", page_content="foo", metadata={"baz": "bar"})
document_2 = Document(id="2", page_content="thud", metadata={"bar": "baz"})
document_3 = Document(id="3", page_content="i will be deleted :(")
documents = [document_1, document_2, document_3]
vector_store.add_documents(documents=documents)
Inspect documents:
top_n = 10
for index, (id, doc) in enumerate(vector_store.store.items()):
if index < top_n:
# docs have keys 'id', 'vector', 'text', 'metadata'
print(f"{id}: {doc['text']}")
else:
break
Delete Documents:
vector_store.delete(ids=["3"])
Search:
results = vector_store.similarity_search(query="thud", k=1)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* thud [{'bar': 'baz'}]
Search with filter:
def _filter_function(doc: Document) -> bool:
return doc.metadata.get("bar") == "baz"
results = vector_store.similarity_search(
query="thud", k=1, filter=_filter_function
)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* thud [{'bar': 'baz'}]
Search with score:
results = vector_store.similarity_search_with_score(query="qux", k=1)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.832268] foo [{'baz': 'bar'}]
Async:
# add documents
# await vector_store.aadd_documents(documents=documents)
# delete documents
# await vector_store.adelete(ids=["3"])
# search
# results = vector_store.asimilarity_search(query="thud", k=1)
# search with score
results = await vector_store.asimilarity_search_with_score(query="qux", k=1)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.832268] foo [{'baz': 'bar'}]
Use as Retriever:
retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5},
)
retriever.invoke("thud")
[Document(id='2', metadata={'bar': 'baz'}, page_content='thud')]| Name | Type | Description |
|---|---|---|
embedding* | Embeddings | embedding function to use. |
| Name | Type |
|---|---|
| embedding | Embeddings |
Get documents by their ids.
Async get documents by their ids.
Search for the most similar documents to the given embedding.
Load a vector store from a file.
Dump the vector store to a file.
Run more texts through the embeddings and add to the VectorStore.
Async run more texts through the embeddings and add to the VectorStore.
Return docs most similar to query using a specified search type.
Async return docs most similar to query using a specified search type.
Return docs and relevance scores in the range [0, 1].
Async return docs and relevance scores in the range [0, 1].
Async return docs selected using the maximal marginal relevance.
Return VectorStore initialized from documents and embeddings.
Async return VectorStore initialized from documents and embeddings.
Return VectorStoreRetriever initialized from this VectorStore.