Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/getting-started/genai.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Feast integrates with popular vector databases to store and retrieve embedding v
* **Elasticsearch**: Scalable vector search capabilities
* **Postgres with PGVector**: SQL-based vector operations
* **Qdrant**: Purpose-built vector database integration
* **ScyllaDB**: Native `vector<float, N>` type with HNSW ANN index, full `retrieve_online_documents_v2` support

These integrations allow you to:
- Store embeddings as features
Expand Down
3 changes: 2 additions & 1 deletion docs/reference/alpha-vector-database.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Below are supported vector databases and implemented features:
| Faiss | [ ] | [ ] | [] | [] |
| SQLite | [x] | [ ] | [x] | [x] |
| Qdrant | [x] | [x] | [] | [] |
| ScyllaDB | [x] | [x] | [x] | [x] |

*Note: V2 Support means the SDK supports retrieval of features along with vector embeddings from vector similarity search.

Expand All @@ -30,7 +31,7 @@ Beyond that, we will then have `retrieve_online_documents` and `retrieve_online_
backwards compatibility and the adopt industry standard naming conventions.
{% endhint %}

**Note**: Milvus and SQLite implement the v2 `retrieve_online_documents_v2` method in the SDK. This will be the longer-term solution so that Data Scientists can easily enable vector similarity search by just flipping a flag.
**Note**: Milvus, SQLite, and ScyllaDB implement the v2 `retrieve_online_documents_v2` method in the SDK. This will be the longer-term solution so that Data Scientists can easily enable vector similarity search by just flipping a flag.

## Examples

Expand Down
93 changes: 73 additions & 20 deletions docs/reference/online-stores/scylladb.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,15 @@

## Description

ScyllaDB is a low-latency and high-performance Cassandra-compatible (uses CQL) database. You can use the existing Cassandra connector to use ScyllaDB as an online store in Feast.

The [ScyllaDB](https://www.scylladb.com/) online store provides support for materializing feature values into a ScyllaDB or [ScyllaDB Cloud](https://www.scylladb.com/product/scylla-cloud/) cluster for serving online features real-time.
[ScyllaDB](https://www.scylladb.com/) is a distributed real-time NoSQL database with vector search support.
This integration uses the native **`scylla-driver`** Python driver for optimised performance and supports materializing feature values into a [ScyllaDB Cloud](https://www.scylladb.com/product/scylla-cloud/) cluster for real-time online feature serving.

## Getting started

Install Feast with Cassandra support:
```bash
pip install "feast[cassandra]"
```
Install Feast with the `scylladb` extra, which pulls in `scylla-driver` automatically:

Create a new Feast project:
```bash
feast init REPO_NAME -t cassandra
pip install feast[scylladb]
```

### Example (ScyllaDB)
Expand All @@ -26,7 +21,7 @@ project: scylla_feature_repo
registry: data/registry.db
provider: local
online_store:
type: cassandra
type: scylladb
hosts:
- 172.17.0.2
keyspace: feast
Expand All @@ -43,36 +38,94 @@ project: scylla_feature_repo
registry: data/registry.db
provider: local
online_store:
type: cassandra
type: scylladb
hosts:
- node-0.aws_us_east_1.xxxxxxxx.clusters.scylla.cloud
- node-1.aws_us_east_1.xxxxxxxx.clusters.scylla.cloud
- node-2.aws_us_east_1.xxxxxxxx.clusters.scylla.cloud
keyspace: feast
username: scylla
password: password
password: xxxxxx
local_dc: AWS_US_EAST_1
```
{% endcode %}


The full set of configuration options is available in [CassandraOnlineStoreConfig](https://rtd.feast.dev/en/master/#feast.infra.online_stores.cassandra_online_store.cassandra_online_store.CassandraOnlineStoreConfig).
For a full explanation of configuration options please look at file
`sdk/python/feast/infra/online_stores/contrib/cassandra_online_store/README.md`.
## Configuration options

| Parameter | Type | Default | Description |
| :--- | :--- | :--- | :--- |
| `hosts` | list[str] | *(required)* | Contact-point host addresses. |
| `port` | int | `9042` | CQL port. |
| `keyspace` | str | `feast_keyspace` | Target ScyllaDB keyspace. |
| `username` | str | `None` | Auth username. |
| `password` | str | `None` | Auth password. |
| `local_dc` | str | `None` | Local datacenter name for DC-aware load balancing. |
| `request_timeout` | float | `None` | Driver request timeout in seconds. |
| `read_concurrency` | int | `100` | `concurrency` argument passed to the driver's `execute_concurrent_with_args` for reads. Controls how many CQL statements are in-flight at once. |
| `write_concurrency` | int | `100` | `concurrency` argument passed to the driver's `execute_concurrent_with_args` for writes. Controls how many CQL statements are in-flight at once. |
| `vector_similarity_function` | str | `COSINE` | Default similarity function for vector indexes. Supported: `COSINE`, `DOT_PRODUCT`, `EUCLIDEAN`. Can be overridden per-feature via the `similarity_function` Field tag. |

Storage specifications can be found at `docs/specs/online_store_format.md`.

## Vector Search

ScyllaDB Cloud supports approximate nearest-neighbour (ANN) vector search.
To enable it for a feature view, tag the embedding `Field` with `vector_index=true` and specify the number of dimensions:

{% code title="feature_definitions.py" %}
```python
from feast import FeatureView, Field
from feast.types import Array, Float32, String

documents_fv = FeatureView(
name="documents",
entities=[item],
schema=[
Field(name="text", dtype=String),
Field(
name="embedding",
dtype=Array(Float32),
tags={
"vector_index": "true",
"dimensions": "768",
"similarity_function": "COSINE", # COSINE | DOT_PRODUCT | EUCLIDEAN
},
),
],
online=True,
source=push_source,
)
```
{% endcode %}

When `feast apply` runs, the store creates:

- A regular feature table (`{project}_{fv_name}`) for `online_read` / `online_write_batch`.
- A vector table (`{project}_{fv_name}__{feature}_vec`) with a native `vector<float, N>` column and an HNSW ANN index.

To query the top-k most similar documents:

```python
result = store.retrieve_online_documents_v2(
features=["documents:text", "documents:embedding"],
query=[0.1, 0.2, ...], # your query embedding
top_k=10,
distance_metric="COSINE",
)
```

## Functionality Matrix

The set of functionality supported by online stores is described in detail [here](overview.md#functionality).
Below is a matrix indicating which functionality is supported by the Cassandra plugin.
Below is a matrix indicating which functionality is supported by the ScyllaDB online store.

| | Cassandra |
| | ScyllaDB |
| :-------------------------------------------------------- | :-------- |
| write feature values to the online store | yes |
| read feature values from the online store | yes |
| update infrastructure (e.g. tables) in the online store | yes |
| teardown infrastructure (e.g. tables) in the online store | yes |
| generate a plan of infrastructure changes | yes |
| generate a plan of infrastructure changes | no |
| support for on-demand transforms | yes |
| readable by Python SDK | yes |
| readable by Java | no |
Expand All @@ -89,6 +142,6 @@ To compare this set of functionality against other online stores, please see the

## Resources

* [Sample application with ScyllaDB](https://feature-store.scylladb.com/stable/)
* [ScyllaDB Vector Search documentation](https://cloud.docs.scylladb.com/stable/vector-search/)
* [ScyllaDB website](https://www.scylladb.com/)
* [ScyllaDB Cloud documentation](https://cloud.docs.scylladb.com/stable/)
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ azure = [
"pymssql<2.3.3"
]
cassandra = ["cassandra-driver>=3.24.0,<4"]
scylladb = ["scylla-driver>=3.28.0,<4"]
clickhouse = ["clickhouse-connect>=0.7.19"]
couchbase = ["couchbase==4.3.2", "couchbase-columnar==1.0.0"]
delta = ["deltalake<1.0.0"]
Expand Down Expand Up @@ -158,7 +159,7 @@ test = [
]

ci = [
"feast[test, aws, azure, cassandra, clickhouse, couchbase, delta, docling, duckdb, elasticsearch, faiss, gcp, ge, go, grpcio, hazelcast, hbase, ibis, image, k8s, mcp, milvus, mlflow, mongodb, mssql, mysql, openlineage, opentelemetry, oracle, spark, trino, postgres, pytorch, qdrant, rag, ray, redis, singlestore, snowflake, sqlite_vec]",
"feast[test, aws, azure, cassandra, clickhouse, couchbase, delta, docling, duckdb, elasticsearch, faiss, gcp, ge, go, grpcio, hazelcast, hbase, ibis, image, k8s, mcp, milvus, mlflow, mongodb, mssql, mysql, openlineage, opentelemetry, oracle, scylladb, spark, trino, postgres, pytorch, qdrant, rag, ray, redis, singlestore, snowflake, sqlite_vec]",
"build",
"virtualenv==20.23.0",
"dbt-artifacts-parser",
Expand Down
Empty file.
Loading
Loading