Skip to content

Add SQLite Hybrid Search and BM25 for local vector search #5073

@franciscojavierarceo

Description

@franciscojavierarceo

Is your feature request related to a problem? Please describe.
See this example: https://github.com/liamca/sqlite-hybrid-search/tree/main and the sqlite docs: https://www.sqlite.org/fts5.html

This should be complemented with the SQLite-vec implementation.

Describe the solution you'd like

document_embeddings = FeatureView(
    name="embedded_documents",
    entities=[item, author],
    schema=[
        Field(
            name="vector",
            dtype=Array(Float32),
            # Look how easy it is to enable RAG!
            vector_index=True,
            vector_search_metric="COSINE",
        ),
        Field(name="item_id", dtype=Int64),
        Field(name="author_id", dtype=String),
        Field(name="created_timestamp", dtype=UnixTimestamp),
        Field(name="sentence_chunks", dtype=String),
        Field(name="event_timestamp", dtype=UnixTimestamp),
    ],
    source=rag_documents_source,
    ttl=timedelta(hours=24),
)

Somewhere in the FeatureView we should allow the search to be declared explicitly.

Also, need to drop the query_embedding as a required input in:

results = store.retrieve_online_documents_v2(
    features=[
        "document_embeddings:Embeddings",
        "document_embeddings:content",
        "document_embeddings:title",
    ],
    query=query_embedding,
    query_string="(content: 5) OR (title: 1) OR (title: 3)",
    top_k=3,
).to_dict()
print(results)

Describe alternatives you've considered
TBD

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureNew feature or requestwontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions