Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@ format-python:
cd ${ROOT_DIR}/sdk/python; python -m black --target-version py38 feast tests

lint-python:
cd ${ROOT_DIR}/sdk/python; python -m mypy
cd ${ROOT_DIR}/sdk/python; python -m mypy --exclude=/tests/ --follow-imports=skip feast
cd ${ROOT_DIR}/sdk/python; python -m isort feast/ tests/ --check-only
cd ${ROOT_DIR}/sdk/python; python -m flake8 feast/ tests/
cd ${ROOT_DIR}/sdk/python; python -m black --check feast tests
Expand Down
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@
* [MySQL (contrib)](reference/online-stores/mysql.md)
* [Rockset (contrib)](reference/online-stores/rockset.md)
* [Hazelcast (contrib)](reference/online-stores/hazelcast.md)
* [ScyllaDB (contrib)](reference/online-stores/scylladb.md)
* [Providers](reference/providers/README.md)
* [Local](reference/providers/local.md)
* [Google Cloud Platform](reference/providers/google-cloud-platform.md)
Expand Down
4 changes: 3 additions & 1 deletion docs/reference/online-stores/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,6 @@ Please see [Online Store](../../getting-started/architecture-and-components/onli
[hazelcast.md](hazelcast.md)
{% endcontent-ref %}


{% content-ref url="scylladb.md" %}
[scylladb.md](scylladb.md)
{% endcontent-ref %}
94 changes: 94 additions & 0 deletions docs/reference/online-stores/scylladb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# ScyllaDB Cloud online store

## Description

ScyllaDB is a low-latency and high-performance Cassandra-compatible (uses CQL) database. You can use the existing Cassandra connector to use ScyllaDB as an online store in Feast.

The [ScyllaDB](https://www.scylladb.com/) online store provides support for materializing feature values into a ScyllaDB or [ScyllaDB Cloud](https://www.scylladb.com/product/scylla-cloud/) cluster for serving online features real-time.

## Getting started

Install Feast with Cassandra support:
```bash
pip install "feast[cassandra]"
```

Create a new Feast project:
```bash
feast init REPO_NAME -t cassandra
```

### Example (ScyllaDB)

{% code title="feature_store.yaml" %}
```yaml
project: scylla_feature_repo
registry: data/registry.db
provider: local
online_store:
type: cassandra
hosts:
- 172.17.0.2
keyspace: feast
username: scylla
password: password
```
{% endcode %}

### Example (ScyllaDB Cloud)

{% code title="feature_store.yaml" %}
```yaml
project: scylla_feature_repo
registry: data/registry.db
provider: local
online_store:
type: cassandra
hosts:
- node-0.aws_us_east_1.xxxxxxxx.clusters.scylla.cloud
- node-1.aws_us_east_1.xxxxxxxx.clusters.scylla.cloud
- node-2.aws_us_east_1.xxxxxxxx.clusters.scylla.cloud
keyspace: feast
username: scylla
password: password
```
{% endcode %}


The full set of configuration options is available in [CassandraOnlineStoreConfig](https://rtd.feast.dev/en/master/#feast.infra.online_stores.contrib.cassandra_online_store.cassandra_online_store.CassandraOnlineStoreConfig).
For a full explanation of configuration options please look at file
`sdk/python/feast/infra/online_stores/contrib/cassandra_online_store/README.md`.

Storage specifications can be found at `docs/specs/online_store_format.md`.

## Functionality Matrix

The set of functionality supported by online stores is described in detail [here](overview.md#functionality).
Below is a matrix indicating which functionality is supported by the Cassandra plugin.

| | Cassandra |
| :-------------------------------------------------------- | :-------- |
| write feature values to the online store | yes |
| read feature values from the online store | yes |
| update infrastructure (e.g. tables) in the online store | yes |
| teardown infrastructure (e.g. tables) in the online store | yes |
| generate a plan of infrastructure changes | yes |
| support for on-demand transforms | yes |
| readable by Python SDK | yes |
| readable by Java | no |
| readable by Go | no |
| support for entityless feature views | yes |
| support for concurrent writing to the same key | no |
| support for ttl (time to live) at retrieval | no |
| support for deleting expired data | no |
| collocated by feature view | yes |
| collocated by feature service | no |
| collocated by entity key | no |

To compare this set of functionality against other online stores, please see the full [functionality matrix](overview.md#functionality-matrix).

## Resources

* [Sample application with ScyllaDB](https://feature-store.scylladb.com/stable/)
* [ScyllaDB website](https://www.scylladb.com/)
* [ScyllaDB Cloud documentation](https://cloud.docs.scylladb.com/stable/)
19 changes: 9 additions & 10 deletions sdk/python/feast/data_source.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import enum
import warnings
from abc import ABC, abstractmethod
Expand Down Expand Up @@ -485,12 +484,12 @@ def to_proto(self) -> DataSourceProto:
return data_source_proto

def validate(self, config: RepoConfig):
pass
raise NotImplementedError

def get_table_column_names_and_types(
self, config: RepoConfig
) -> Iterable[Tuple[str, str]]:
pass
raise NotImplementedError

@staticmethod
def source_datatype_to_feast_value_type() -> Callable[[str], ValueType]:
Expand Down Expand Up @@ -534,12 +533,12 @@ def __init__(
self.schema = schema

def validate(self, config: RepoConfig):
pass
raise NotImplementedError

def get_table_column_names_and_types(
self, config: RepoConfig
) -> Iterable[Tuple[str, str]]:
pass
raise NotImplementedError

def __eq__(self, other):
if not isinstance(other, RequestSource):
Expand Down Expand Up @@ -610,12 +609,12 @@ def source_datatype_to_feast_value_type() -> Callable[[str], ValueType]:
@typechecked
class KinesisSource(DataSource):
def validate(self, config: RepoConfig):
pass
raise NotImplementedError

def get_table_column_names_and_types(
self, config: RepoConfig
) -> Iterable[Tuple[str, str]]:
pass
raise NotImplementedError

@staticmethod
def from_proto(data_source: DataSourceProto):
Expand All @@ -639,7 +638,7 @@ def from_proto(data_source: DataSourceProto):

@staticmethod
def source_datatype_to_feast_value_type() -> Callable[[str], ValueType]:
pass
raise NotImplementedError

def get_table_query_string(self) -> str:
raise NotImplementedError
Expand Down Expand Up @@ -772,12 +771,12 @@ def __hash__(self):
return super().__hash__()

def validate(self, config: RepoConfig):
pass
raise NotImplementedError

def get_table_column_names_and_types(
self, config: RepoConfig
) -> Iterable[Tuple[str, str]]:
pass
raise NotImplementedError

@staticmethod
def from_proto(data_source: DataSourceProto):
Expand Down
2 changes: 1 addition & 1 deletion sdk/python/feast/feature_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def __init__(
*,
name: str,
features: List[Union[FeatureView, OnDemandFeatureView]],
tags: Dict[str, str] = None,
tags: Optional[Dict[str, str]] = None,
description: str = "",
owner: str = "",
logging_config: Optional[LoggingConfig] = None,
Expand Down
2 changes: 1 addition & 1 deletion sdk/python/feast/feature_view.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ def __init__(
name: str,
source: DataSource,
schema: Optional[List[Field]] = None,
entities: List[Entity] = None,
entities: Optional[List[Entity]] = None,
ttl: Optional[timedelta] = timedelta(days=0),
online: bool = True,
description: str = "",
Expand Down
2 changes: 1 addition & 1 deletion sdk/python/feast/importer.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
)


def import_class(module_name: str, class_name: str, class_type: str = None):
def import_class(module_name: str, class_name: str, class_type: str = ""):
"""
Dynamically loads and returns a class from a module.

Expand Down
15 changes: 12 additions & 3 deletions sdk/python/feast/infra/contrib/spark_kafka_processor.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
from types import MethodType
from typing import List, Optional
from typing import List, Optional, no_type_check

import pandas as pd
from pyspark.sql import DataFrame, SparkSession
from pyspark.sql.avro.functions import from_avro
from pyspark.sql.functions import col, from_json
from pyspark.sql.streaming import StreamingQuery

from feast.data_format import AvroFormat, JsonFormat
from feast.data_source import KafkaSource, PushMode
Expand Down Expand Up @@ -63,12 +64,20 @@ def __init__(
self.join_keys = [fs.get_entity(entity).join_key for entity in sfv.entities]
super().__init__(fs=fs, sfv=sfv, data_source=sfv.stream_source)

def ingest_stream_feature_view(self, to: PushMode = PushMode.ONLINE) -> None:
# Type hinting for data_source type.
# data_source type has been checked to be an instance of KafkaSource.
self.data_source: KafkaSource = self.data_source # type: ignore

def ingest_stream_feature_view(
self, to: PushMode = PushMode.ONLINE
) -> StreamingQuery:
ingested_stream_df = self._ingest_stream_data()
transformed_df = self._construct_transformation_plan(ingested_stream_df)
online_store_query = self._write_stream_data(transformed_df, to)
return online_store_query

# In the line 64 of __init__(), the "data_source" is assigned a stream_source (and has to be KafkaSource as in line 40).
@no_type_check
def _ingest_stream_data(self) -> StreamTable:
"""Only supports json and avro formats currently."""
if self.format == "json":
Expand Down Expand Up @@ -122,7 +131,7 @@ def _ingest_stream_data(self) -> StreamTable:
def _construct_transformation_plan(self, df: StreamTable) -> StreamTable:
return self.sfv.udf.__call__(df) if self.sfv.udf else df

def _write_stream_data(self, df: StreamTable, to: PushMode):
def _write_stream_data(self, df: StreamTable, to: PushMode) -> StreamingQuery:
# Validation occurs at the fs.write_to_online_store() phase against the stream feature view schema.
def batch_write(row: DataFrame, batch_id: int):
rows: pd.DataFrame = row.toPandas()
Expand Down
17 changes: 11 additions & 6 deletions sdk/python/feast/infra/contrib/stream_processor.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
from abc import ABC
from abc import ABC, abstractmethod
from types import MethodType
from typing import TYPE_CHECKING, Optional

from pyspark.sql import DataFrame
from typing_extensions import TypeAlias

from feast.data_source import DataSource, PushMode
from feast.importer import import_class
Expand All @@ -17,7 +18,7 @@
}

# TODO: support more types other than just Spark.
StreamTable = DataFrame
StreamTable: TypeAlias = DataFrame


class ProcessorConfig(FeastConfigBaseModel):
Expand Down Expand Up @@ -49,33 +50,37 @@ def __init__(
self.sfv = sfv
self.data_source = data_source

@abstractmethod
def ingest_stream_feature_view(self, to: PushMode = PushMode.ONLINE) -> None:
"""
Ingests data from the stream source attached to the stream feature view; transforms the data
and then persists it to the online store and/or offline store, depending on the 'to' parameter.
"""
pass
raise NotImplementedError

@abstractmethod
def _ingest_stream_data(self) -> StreamTable:
"""
Ingests data into a StreamTable.
"""
pass
raise NotImplementedError

@abstractmethod
def _construct_transformation_plan(self, table: StreamTable) -> StreamTable:
"""
Applies transformations on top of StreamTable object. Since stream engines use lazy
evaluation, the StreamTable will not be materialized until it is actually evaluated.
For example: df.collect() in spark or tbl.execute() in Flink.
"""
pass
raise NotImplementedError

@abstractmethod
def _write_stream_data(self, table: StreamTable, to: PushMode) -> None:
"""
Launches a job to persist stream data to the online store and/or offline store, depending
on the 'to' parameter, and returns a handle for the job.
"""
pass
raise NotImplementedError


def get_stream_processor_object(
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from typing import Literal

from pydantic import StrictBool, StrictStr
from pydantic.typing import Literal

from feast.infra.feature_servers.base_config import BaseFeatureServerConfig

Expand Down
2 changes: 1 addition & 1 deletion sdk/python/feast/infra/feature_servers/base_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,5 @@ class BaseFeatureServerConfig(FeastConfigBaseModel):
enabled: StrictBool = False
"""Whether the feature server should be launched."""

feature_logging: Optional[FeatureLoggingConfig]
feature_logging: Optional[FeatureLoggingConfig] = None
""" Feature logging configuration """
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from typing import Literal

from pydantic import StrictBool
from pydantic.typing import Literal

from feast.infra.feature_servers.base_config import BaseFeatureServerConfig

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from pydantic.typing import Literal
from typing import Literal

from feast.infra.feature_servers.base_config import BaseFeatureServerConfig

Expand Down
6 changes: 2 additions & 4 deletions sdk/python/feast/infra/materialization/snowflake_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import click
import pandas as pd
from colorama import Fore, Style
from pydantic import Field, StrictStr
from pydantic import ConfigDict, Field, StrictStr
from pytz import utc
from tqdm import tqdm

Expand Down Expand Up @@ -72,9 +72,7 @@ class SnowflakeMaterializationEngineConfig(FeastConfigBaseModel):

schema_: Optional[str] = Field("PUBLIC", alias="schema")
""" Snowflake schema name """

class Config:
allow_population_by_field_name = True
model_config = ConfigDict(populate_by_name=True)


@dataclass
Expand Down
Loading