Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions .github/workflows/pr_chronon_integration_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
name: pr-chronon-integration-tests

on:
pull_request:
types:
- opened
- synchronize
- labeled
paths:
- "sdk/python/feast/infra/online_stores/chronon_online_store/**"
- "sdk/python/feast/infra/offline_stores/contrib/chronon_offline_store/**"
- "sdk/python/feast/infra/chronon_provider.py"
- "sdk/python/tests/unit/infra/online_stores/chronon_online_store/**"
- "sdk/python/tests/unit/infra/offline_stores/contrib/chronon_offline_store/**"
- "sdk/python/tests/integration/online_store/test_chronon_online_store.py"
- "sdk/python/tests/integration/online_store/test_chronon_online_store_real_service.py"
- "sdk/python/tests/integration/offline_store/test_chronon_offline_store.py"
- "infra/scripts/chronon/**"
- "chronon/**"
- ".github/workflows/pr_chronon_integration_tests.yml"

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
chronon-python-tests:
if:
((github.event.action == 'labeled' && (github.event.label.name == 'approved' || github.event.label.name == 'lgtm' || github.event.label.name == 'ok-to-test')) ||
(github.event.action != 'labeled' && (contains(github.event.pull_request.labels.*.name, 'ok-to-test') || contains(github.event.pull_request.labels.*.name, 'approved') || contains(github.event.pull_request.labels.*.name, 'lgtm')))) &&
github.event.pull_request.base.repo.full_name == 'feast-dev/feast'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
repository: ${{ github.event.repository.full_name }}
ref: ${{ github.ref }}
token: ${{ secrets.GITHUB_TOKEN }}
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
architecture: x64
- name: Install uv
uses: astral-sh/setup-uv@v5
with:
enable-cache: true
- name: Setup Java
uses: actions/setup-java@v4
with:
distribution: temurin
java-version: "8"
- name: Install dependencies
run: make install-python-dependencies-ci
- name: Checkout Chronon
env:
CHRONON_REPO: ${{ github.workspace }}/.chronon
CHRONON_REPO_URL: https://github.com/airbnb/chronon.git
CHRONON_REF: 6c0b8de9f0301521baf61a46ff3083c566fb4052 # pragma: allowlist secret
run: |
git clone "${CHRONON_REPO_URL}" "${CHRONON_REPO}"
git -C "${CHRONON_REPO}" checkout "${CHRONON_REF}"
- name: Install Chronon build dependencies
run: |
sudo apt-get update
sudo apt-get install -y \
apt-transport-https \
autoconf \
automake \
bison \
build-essential \
ca-certificates \
curl \
flex \
g++ \
gnupg \
libtool \
pkg-config \
python3-pip \
python3-venv
echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
curl -fsSL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x99E82A75642AC823" | sudo apt-key add -
sudo apt-get update
sudo apt-get install -y sbt
curl -sSL "http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz" -o /tmp/thrift.tar.gz
sudo rm -rf /usr/src/thrift
sudo mkdir -p /usr/src/thrift
sudo tar zxf /tmp/thrift.tar.gz -C /usr/src/thrift --strip-components=1
cd /usr/src/thrift
sudo ./configure --without-python --without-cpp
sudo make -j2
sudo make install
python3 -m pip install --break-system-packages build
thrift -version
- name: Build Chronon quickstart and service jars
env:
CHRONON_REPO: ${{ github.workspace }}/.chronon
run: |
cd "${CHRONON_REPO}/quickstart/mongo-online-impl"
sbt assembly
cd "${CHRONON_REPO}"
sbt "project service" assembly
- name: Run Chronon unit and stub integration tests
run: |
uv run pytest -c sdk/python/pytest.ini \
sdk/python/tests/unit/infra/online_stores/chronon_online_store \
sdk/python/tests/unit/infra/offline_stores/contrib/chronon_offline_store \
sdk/python/tests/integration/online_store/test_chronon_online_store.py \
sdk/python/tests/integration/offline_store/test_chronon_offline_store.py \
--integration
- name: Start live Chronon service
env:
CHRONON_REPO: ${{ github.workspace }}/.chronon
run: infra/scripts/chronon/start-local-chronon-service.sh
- name: Run Chronon live-service integration test
env:
CHRONON_SERVICE_URL: http://127.0.0.1:9000
run: |
uv run pytest -c sdk/python/pytest.ini \
sdk/python/tests/integration/online_store/test_chronon_online_store_real_service.py \
--integration
- name: Stop live Chronon service
if: always()
run: infra/scripts/chronon/stop-local-chronon-service.sh
3 changes: 3 additions & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@
* [Clickhouse (contrib)](reference/data-sources/clickhouse.md)
* [Ray (contrib)](reference/data-sources/ray.md)
* [MongoDB (contrib)](reference/data-sources/mongodb.md)
* [Chronon (contrib)](reference/data-sources/chronon.md)
* [Offline stores](reference/offline-stores/README.md)
* [Overview](reference/offline-stores/overview.md)
* [Dask](reference/offline-stores/dask.md)
Expand All @@ -142,6 +143,7 @@
* [Oracle (contrib)](reference/offline-stores/oracle.md)
* [Athena (contrib)](reference/offline-stores/athena.md)
* [MongoDB (contrib)](reference/offline-stores/mongodb.md)
* [Chronon (contrib)](reference/offline-stores/chronon.md)
* [Remote Offline](reference/offline-stores/remote-offline-store.md)
* [Hybrid](reference/offline-stores/hybrid.md)
* [Online stores](reference/online-stores/README.md)
Expand All @@ -164,6 +166,7 @@
* [SingleStore](reference/online-stores/singlestore.md)
* [Milvus](reference/online-stores/milvus.md)
* [MongoDB](reference/online-stores/mongodb.md)
* [Chronon (contrib)](reference/online-stores/chronon.md)
* [Elasticsearch](reference/online-stores/elasticsearch.md)
* [Qdrant](reference/online-stores/qdrant.md)
* [Faiss](reference/online-stores/faiss.md)
Expand Down
57 changes: 57 additions & 0 deletions docs/reference/data-sources/chronon.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Chronon source (contrib)

## Description

Chronon sources describe feature data produced by [Chronon](https://chronon.ai/) and consumed by Feast.
They point Feast at Chronon's offline materialization output and, when online reads are needed, identify the Chronon Join or GroupBy that should be queried from Chronon's online service.

Feast does not compute or materialize Chronon features. Chronon owns feature computation, backfills, consistency, and online serving. Feast uses the source metadata for registry, discovery, historical retrieval, and online lookup.

## Examples

Defining a Chronon source for a Chronon Join:

```python
from feast.infra.offline_stores.contrib.chronon_offline_store.chronon_source import (
ChrononSource,
)

driver_stats_source = ChrononSource(
name="driver_stats",
materialization_path="data/chronon/driver_stats",
chronon_join="team/driver_stats.v1",
online_endpoint="http://localhost:8080",
timestamp_field="event_timestamp",
created_timestamp_column="created_timestamp",
)
```

Defining a Chronon source for a Chronon GroupBy:

```python
driver_profile_source = ChrononSource(
name="driver_profile",
materialization_path="data/chronon/driver_profile",
chronon_group_by="team/driver_profile.v1",
timestamp_field="event_timestamp",
)
```

## Configuration reference

| Parameter | Required | Description |
| :------------------------- | :------- | :---------- |
| `materialization_path` | yes | Local or repository-relative path to Chronon's Parquet materialization output. |
| `chronon_join` | no | Chronon Join name used for online reads, for example `team/training_set.v1`. |
| `chronon_group_by` | no | Chronon GroupBy name used for online reads, for example `team/user_features.v1`. |
| `online_endpoint` | no | Chronon online service base URL for this source. If omitted, Feast uses the Chronon online store `path`. |
| `timestamp_field` | yes | Event timestamp column in the materialized Chronon data. |
| `created_timestamp_column` | no | Optional created timestamp column used to select the latest row when duplicate event timestamps exist. |
| `field_mapping` | no | Standard Feast field mapping applied before retrieval. |

Set at most one of `chronon_join` and `chronon_group_by`. Offline-only sources may omit both, but `online_endpoint` requires one of them so Feast can build the Chronon request URL.

## Supported Types

Chronon sources read Parquet data through PyArrow and use Feast's standard PyArrow type mapping.
For a comparison against other batch data sources, please see [here](overview.md#functionality-matrix).
94 changes: 94 additions & 0 deletions docs/reference/offline-stores/chronon.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Chronon offline store (contrib)

## Description

The Chronon offline store provides support for reading [ChrononSources](../data-sources/chronon.md) from Chronon's Parquet materialization output.

Chronon remains the system of record for feature computation and materialization. Feast reads Chronon-produced data for historical retrieval, feature reuse, and registry-driven training workflows.

## Getting started

Chronon-backed feature repos use the Chronon offline store with Chronon sources:

{% code title="feature_store.yaml" %}
```yaml
project: my_project
registry: data/registry.db
provider: chronon
offline_store:
type: chronon
online_store:
type: chronon
path: http://localhost:8080
```
{% endcode %}

Example feature view:

```python
from feast import Entity, FeatureView, Field
from feast.infra.offline_stores.contrib.chronon_offline_store.chronon_source import (
ChrononSource,
)
from feast.types import Float32

driver = Entity(name="driver", join_keys=["driver_id"])

driver_stats = FeatureView(
name="driver_stats",
entities=[driver],
schema=[Field(name="rating", dtype=Float32)],
source=ChrononSource(
name="driver_stats_source",
materialization_path="data/chronon/driver_stats",
chronon_join="team/driver_stats.v1",
timestamp_field="event_timestamp",
created_timestamp_column="created_timestamp",
),
)
```

## Historical retrieval

`get_historical_features` performs point-in-time joins against Chronon's materialized Parquet data. For each entity row, Feast selects the latest Chronon row with an event timestamp at or before the entity timestamp. If `created_timestamp_column` is configured and duplicate event timestamps exist, the latest created row wins.

This is intended for training and validation workflows that want Feast's registry and retrieval APIs while using Chronon as the feature computation engine.

## Configuration reference

| Parameter | Required | Default | Description |
| :-------- | :------- | :------ | :---------- |
| `type` | yes | - | Must be set to `chronon`. |
| `path` | no | - | Reserved for future offline store configuration. Source-level `materialization_path` controls where data is read from. |

## Functionality Matrix

The set of functionality supported by offline stores is described in detail [here](overview.md#functionality).
Below is a matrix indicating which functionality is supported by the Chronon offline store.

| | Chronon |
| :----------------------------------------------------------------- | :------ |
| `get_historical_features` (point-in-time correct join) | yes |
| `pull_latest_from_table_or_query` (retrieve latest feature values) | yes |
| `pull_all_from_table_or_query` (retrieve a saved dataset) | yes |
| `offline_write_batch` (persist dataframes to offline store) | no |
| `write_logged_features` (persist logged features to offline store) | no |

Below is a matrix indicating which functionality is supported by `ChrononRetrievalJob`.

| | Chronon |
| ----------------------------------------------------- | ------- |
| export to dataframe | yes |
| export to arrow table | yes |
| export to arrow batches | no |
| export to SQL | no |
| export to data lake (S3, GCS, etc.) | no |
| export to data warehouse | no |
| export as Spark dataframe | no |
| local execution of Python-based on-demand transforms | no |
| remote execution of Python-based on-demand transforms | no |
| persist results in the offline store | no |
| preview the query plan before execution | no |
| read partitioned data | yes |

To compare this set of functionality against other offline stores, please see the full [functionality matrix](overview.md#functionality-matrix).
Loading
Loading