ray.md

Ray Data Source (contrib)

⚠️ Contrib Plugin: RaySource is a contributed plugin shipped alongside the Ray offline store. It may not be as stable or fully supported as core data sources.

RaySource is a pure-metadata descriptor that tells Feast how to load a Ray Dataset from any source that Ray Data supports natively — Parquet, CSV, JSON, HuggingFace Datasets, MongoDB, binary files, images, TFRecords, and more.

It is the recommended data source when using the Ray offline store and replaces the need for FileSource for all non-Parquet and non-file-based data.

When to use RaySource vs FileSource

Scenario	Recommended source
Parquet files on disk / S3 / GCS (existing setup)	`FileSource` (backward compatible)
Parquet via Ray reader (pipelines, remote auth)	`RaySource(reader_type="parquet")`
CSV, JSON, text, images via Ray	`RaySource`
HuggingFace `datasets` library	`RaySource(reader_type="huggingface")`
MongoDB, SQL, TFRecords, WebDataset	`RaySource`

Installation

RaySource is bundled with the Ray offline store contrib package:

pip install 'feast[ray]'

Supported `reader_type` values

`reader_type`	Underlying Ray API	Notes
`parquet`	`ray.data.read_parquet`	S3, GCS, HDFS, local
`csv`	`ray.data.read_csv`
`json`	`ray.data.read_json`
`text`	`ray.data.read_text`
`images`	`ray.data.read_images`
`binary_files`	`ray.data.read_binary_files`
`tfrecords`	`ray.data.read_tfrecords`
`webdataset`	`ray.data.read_webdataset`
`huggingface`	`ray.data.from_huggingface`	Wraps `datasets.load_dataset`
`mongo`	`ray.data.read_mongo`
`sql`	`ray.data.read_sql`	Pass `connection_url` in `reader_options`

Configuration

Parameters

Parameter	Type	Required	Description
`name`	`str`	Yes	Unique name for this data source
`reader_type`	`str`	Yes	One of the supported reader types above
`path`	`str`	No	File or directory path (required for file-based readers)
`reader_options`	`dict`	No	Extra keyword arguments forwarded to the Ray reader
`timestamp_field`	`str`	No	Column containing event timestamps
`created_timestamp_column`	`str`	No	Column containing row creation timestamps
`tags`	`dict`	No	Arbitrary key-value metadata
`description`	`str`	No	Human-readable description
`owner`	`str`	No	Owning team or contact

Usage examples

Parquet on S3

from feast.infra.offline_stores.contrib.ray_offline_store.ray_source import RaySource

driver_stats = RaySource(
    name="driver_stats_parquet",
    reader_type="parquet",
    path="s3://my-bucket/driver_stats/",
    timestamp_field="event_timestamp",
)

CSV

sensor_readings = RaySource(
    name="sensor_readings_csv",
    reader_type="csv",
    path="/data/sensors/",
    timestamp_field="ts",
)

HuggingFace dataset

Load a dataset from the HuggingFace Hub directly into Feast.

from feast.infra.offline_stores.contrib.ray_offline_store.ray_source import RaySource

cheque_images = RaySource(
    name="cheque_images_hf",
    reader_type="huggingface",
    reader_options={
        "dataset_name": "cheques_sample_data",
        "split": "train",
    },
    timestamp_field="event_timestamp",
)

MongoDB

transaction_log = RaySource(
    name="transactions_mongo",
    reader_type="mongo",
    reader_options={
        "uri": "mongodb://localhost:27017",
        "database": "featuredb",
        "collection": "transactions",
    },
    timestamp_field="created_at",
)

SQL (via connection URL)

user_features = RaySource(
    name="user_features_sql",
    reader_type="sql",
    reader_options={
        "connection_url": "postgresql+psycopg2://user:password@host:5432/db",  # pragma: allowlist secret
        "query": "SELECT * FROM user_features",
    },
    timestamp_field="event_timestamp",
)

Using RaySource in a BatchFeatureView

from datetime import timedelta
from feast import BatchFeatureView, Entity, Field
from feast.types import Float32, Int64, String
from feast.infra.offline_stores.contrib.ray_offline_store.ray_source import RaySource

cheque = Entity(name="cheque_id", description="Unique cheque identifier")

cheque_source = RaySource(
    name="cheque_images_hf",
    reader_type="huggingface",
    reader_options={
        "dataset_name": "cheques_sample_data",
        "split": "train",
    },
    timestamp_field="event_timestamp",
)

cheque_ocr_fv = BatchFeatureView(
    name="cheque_ocr_features",
    entities=[cheque],
    ttl=timedelta(days=365),
    schema=[
        Field(name="cheque_id", dtype=Int64),
        Field(name="payee_name", dtype=String),
        Field(name="amount", dtype=String),
        Field(name="bank_name", dtype=String),
        Field(name="raw_text", dtype=String),
    ],
    source=cheque_source,
)

Retrieving data as a Ray Dataset

Once the feature view is materialised you can retrieve the offline features directly as a Ray Dataset using the first-class to_ray_dataset() method:

from feast import FeatureStore

store = FeatureStore(".")

# Chain directly on the retrieval job — to_ray_dataset() is a first-class
# method on every RetrievalJobs.
ds = store.get_historical_features(
    features=["cheque_ocr_features:payee_name", "cheque_ocr_features:amount"],
    entity_df=entity_df,
).to_ray_dataset()

# Use the dataset downstream in Ray or ML pipelines
ds.show(3)

Proto serialisation

RaySource is fully serialisable to Feast's protobuf registry format. The reader_type, path, and reader_options dict are all persisted and can be round-tripped via to_proto() / from_proto().

Limitations

The Ray offline store (and therefore RaySource) requires feast[ray].
reader_type="sql" requires a serialisable connection_url; raw sqlalchemy.engine.Engine objects cannot be pickled across Ray workers.
Streaming sources (Kafka, Kinesis) are not supported via RaySource; use the dedicated Kafka or Kinesis data sources.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ray Data Source (contrib)

When to use RaySource vs FileSource

Installation

Supported `reader_type` values

Configuration

Parameters

Usage examples

Parquet on S3

CSV

HuggingFace dataset

MongoDB

SQL (via connection URL)

Using RaySource in a BatchFeatureView

Retrieving data as a Ray Dataset

Proto serialisation

Limitations

Related pages

FilesExpand file tree

ray.md

Latest commit

History

ray.md

File metadata and controls

Ray Data Source (contrib)

When to use RaySource vs FileSource

Installation

Supported reader_type values

Configuration

Parameters

Usage examples

Parquet on S3

CSV

HuggingFace dataset

MongoDB

SQL (via connection URL)

Using RaySource in a BatchFeatureView

Retrieving data as a Ray Dataset

Proto serialisation

Limitations

Related pages

Supported `reader_type` values