Skip to content

Commit a68a4ef

Browse files
feat: add Chronon online and offline store integrations
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
1 parent eb042f0 commit a68a4ef

22 files changed

Lines changed: 2262 additions & 0 deletions

File tree

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
name: pr-chronon-integration-tests
2+
3+
on:
4+
pull_request:
5+
types:
6+
- opened
7+
- synchronize
8+
- labeled
9+
paths:
10+
- "sdk/python/feast/infra/online_stores/chronon_online_store/**"
11+
- "sdk/python/feast/infra/offline_stores/contrib/chronon_offline_store/**"
12+
- "sdk/python/feast/infra/chronon_provider.py"
13+
- "sdk/python/tests/unit/infra/online_stores/chronon_online_store/**"
14+
- "sdk/python/tests/unit/infra/offline_stores/contrib/chronon_offline_store/**"
15+
- "sdk/python/tests/integration/online_store/test_chronon_online_store.py"
16+
- "sdk/python/tests/integration/online_store/test_chronon_online_store_real_service.py"
17+
- "sdk/python/tests/integration/offline_store/test_chronon_offline_store.py"
18+
- "infra/scripts/chronon/**"
19+
- "chronon/**"
20+
- ".github/workflows/pr_chronon_integration_tests.yml"
21+
22+
concurrency:
23+
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
24+
cancel-in-progress: true
25+
26+
jobs:
27+
chronon-python-tests:
28+
if:
29+
((github.event.action == 'labeled' && (github.event.label.name == 'approved' || github.event.label.name == 'lgtm' || github.event.label.name == 'ok-to-test')) ||
30+
(github.event.action != 'labeled' && (contains(github.event.pull_request.labels.*.name, 'ok-to-test') || contains(github.event.pull_request.labels.*.name, 'approved') || contains(github.event.pull_request.labels.*.name, 'lgtm')))) &&
31+
github.event.pull_request.base.repo.full_name == 'feast-dev/feast'
32+
runs-on: ubuntu-latest
33+
steps:
34+
- uses: actions/checkout@v4
35+
with:
36+
repository: ${{ github.event.repository.full_name }}
37+
ref: ${{ github.ref }}
38+
token: ${{ secrets.GITHUB_TOKEN }}
39+
- name: Setup Python
40+
uses: actions/setup-python@v5
41+
with:
42+
python-version: "3.11"
43+
architecture: x64
44+
- name: Install uv
45+
uses: astral-sh/setup-uv@v5
46+
with:
47+
enable-cache: true
48+
- name: Setup Java
49+
uses: actions/setup-java@v4
50+
with:
51+
distribution: temurin
52+
java-version: "8"
53+
- name: Install dependencies
54+
run: make install-python-dependencies-ci
55+
- name: Checkout Chronon
56+
env:
57+
CHRONON_REPO: ${{ github.workspace }}/.chronon
58+
CHRONON_REPO_URL: https://github.com/airbnb/chronon.git
59+
CHRONON_REF: 6c0b8de9f0301521baf61a46ff3083c566fb4052 # pragma: allowlist secret
60+
run: |
61+
git clone "${CHRONON_REPO_URL}" "${CHRONON_REPO}"
62+
git -C "${CHRONON_REPO}" checkout "${CHRONON_REF}"
63+
- name: Install Chronon build dependencies
64+
run: |
65+
sudo apt-get update
66+
sudo apt-get install -y \
67+
apt-transport-https \
68+
autoconf \
69+
automake \
70+
bison \
71+
build-essential \
72+
ca-certificates \
73+
curl \
74+
flex \
75+
g++ \
76+
gnupg \
77+
libtool \
78+
pkg-config \
79+
python3-pip \
80+
python3-venv
81+
echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
82+
curl -fsSL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x99E82A75642AC823" | sudo apt-key add -
83+
sudo apt-get update
84+
sudo apt-get install -y sbt
85+
curl -sSL "http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz" -o /tmp/thrift.tar.gz
86+
sudo rm -rf /usr/src/thrift
87+
sudo mkdir -p /usr/src/thrift
88+
sudo tar zxf /tmp/thrift.tar.gz -C /usr/src/thrift --strip-components=1
89+
cd /usr/src/thrift
90+
sudo ./configure --without-python --without-cpp
91+
sudo make -j2
92+
sudo make install
93+
python3 -m pip install --break-system-packages build
94+
thrift -version
95+
- name: Build Chronon quickstart and service jars
96+
env:
97+
CHRONON_REPO: ${{ github.workspace }}/.chronon
98+
run: |
99+
cd "${CHRONON_REPO}/quickstart/mongo-online-impl"
100+
sbt assembly
101+
cd "${CHRONON_REPO}"
102+
sbt "project service" assembly
103+
- name: Run Chronon unit and stub integration tests
104+
run: |
105+
uv run pytest -c sdk/python/pytest.ini \
106+
sdk/python/tests/unit/infra/online_stores/chronon_online_store \
107+
sdk/python/tests/unit/infra/offline_stores/contrib/chronon_offline_store \
108+
sdk/python/tests/integration/online_store/test_chronon_online_store.py \
109+
sdk/python/tests/integration/offline_store/test_chronon_offline_store.py \
110+
--integration
111+
- name: Start live Chronon service
112+
env:
113+
CHRONON_REPO: ${{ github.workspace }}/.chronon
114+
run: infra/scripts/chronon/start-local-chronon-service.sh
115+
- name: Run Chronon live-service integration test
116+
env:
117+
CHRONON_SERVICE_URL: http://127.0.0.1:9000
118+
run: |
119+
uv run pytest -c sdk/python/pytest.ini \
120+
sdk/python/tests/integration/online_store/test_chronon_online_store_real_service.py \
121+
--integration
122+
- name: Stop live Chronon service
123+
if: always()
124+
run: infra/scripts/chronon/stop-local-chronon-service.sh

docs/SUMMARY.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@
125125
* [Clickhouse (contrib)](reference/data-sources/clickhouse.md)
126126
* [Ray (contrib)](reference/data-sources/ray.md)
127127
* [MongoDB (contrib)](reference/data-sources/mongodb.md)
128+
* [Chronon (contrib)](reference/data-sources/chronon.md)
128129
* [Offline stores](reference/offline-stores/README.md)
129130
* [Overview](reference/offline-stores/overview.md)
130131
* [Dask](reference/offline-stores/dask.md)
@@ -142,6 +143,7 @@
142143
* [Oracle (contrib)](reference/offline-stores/oracle.md)
143144
* [Athena (contrib)](reference/offline-stores/athena.md)
144145
* [MongoDB (contrib)](reference/offline-stores/mongodb.md)
146+
* [Chronon (contrib)](reference/offline-stores/chronon.md)
145147
* [Remote Offline](reference/offline-stores/remote-offline-store.md)
146148
* [Hybrid](reference/offline-stores/hybrid.md)
147149
* [Online stores](reference/online-stores/README.md)
@@ -164,6 +166,7 @@
164166
* [SingleStore](reference/online-stores/singlestore.md)
165167
* [Milvus](reference/online-stores/milvus.md)
166168
* [MongoDB](reference/online-stores/mongodb.md)
169+
* [Chronon (contrib)](reference/online-stores/chronon.md)
167170
* [Elasticsearch](reference/online-stores/elasticsearch.md)
168171
* [Qdrant](reference/online-stores/qdrant.md)
169172
* [Faiss](reference/online-stores/faiss.md)
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Chronon source (contrib)
2+
3+
## Description
4+
5+
Chronon sources describe feature data produced by [Chronon](https://chronon.ai/) and consumed by Feast.
6+
They point Feast at Chronon's offline materialization output and, when online reads are needed, identify the Chronon Join or GroupBy that should be queried from Chronon's online service.
7+
8+
Feast does not compute or materialize Chronon features. Chronon owns feature computation, backfills, consistency, and online serving. Feast uses the source metadata for registry, discovery, historical retrieval, and online lookup.
9+
10+
## Examples
11+
12+
Defining a Chronon source for a Chronon Join:
13+
14+
```python
15+
from feast.infra.offline_stores.contrib.chronon_offline_store.chronon_source import (
16+
ChrononSource,
17+
)
18+
19+
driver_stats_source = ChrononSource(
20+
name="driver_stats",
21+
materialization_path="data/chronon/driver_stats",
22+
chronon_join="team/driver_stats.v1",
23+
online_endpoint="http://localhost:8080",
24+
timestamp_field="event_timestamp",
25+
created_timestamp_column="created_timestamp",
26+
)
27+
```
28+
29+
Defining a Chronon source for a Chronon GroupBy:
30+
31+
```python
32+
driver_profile_source = ChrononSource(
33+
name="driver_profile",
34+
materialization_path="data/chronon/driver_profile",
35+
chronon_group_by="team/driver_profile.v1",
36+
timestamp_field="event_timestamp",
37+
)
38+
```
39+
40+
## Configuration reference
41+
42+
| Parameter | Required | Description |
43+
| :------------------------- | :------- | :---------- |
44+
| `materialization_path` | yes | Local or repository-relative path to Chronon's Parquet materialization output. |
45+
| `chronon_join` | no | Chronon Join name used for online reads, for example `team/training_set.v1`. |
46+
| `chronon_group_by` | no | Chronon GroupBy name used for online reads, for example `team/user_features.v1`. |
47+
| `online_endpoint` | no | Chronon online service base URL for this source. If omitted, Feast uses the Chronon online store `path`. |
48+
| `timestamp_field` | yes | Event timestamp column in the materialized Chronon data. |
49+
| `created_timestamp_column` | no | Optional created timestamp column used to select the latest row when duplicate event timestamps exist. |
50+
| `field_mapping` | no | Standard Feast field mapping applied before retrieval. |
51+
52+
Set at most one of `chronon_join` and `chronon_group_by`. Offline-only sources may omit both, but `online_endpoint` requires one of them so Feast can build the Chronon request URL.
53+
54+
## Supported Types
55+
56+
Chronon sources read Parquet data through PyArrow and use Feast's standard PyArrow type mapping.
57+
For a comparison against other batch data sources, please see [here](overview.md#functionality-matrix).
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Chronon offline store (contrib)
2+
3+
## Description
4+
5+
The Chronon offline store provides support for reading [ChrononSources](../data-sources/chronon.md) from Chronon's Parquet materialization output.
6+
7+
Chronon remains the system of record for feature computation and materialization. Feast reads Chronon-produced data for historical retrieval, feature reuse, and registry-driven training workflows.
8+
9+
## Getting started
10+
11+
Chronon-backed feature repos use the Chronon offline store with Chronon sources:
12+
13+
{% code title="feature_store.yaml" %}
14+
```yaml
15+
project: my_project
16+
registry: data/registry.db
17+
provider: chronon
18+
offline_store:
19+
type: chronon
20+
online_store:
21+
type: chronon
22+
path: http://localhost:8080
23+
```
24+
{% endcode %}
25+
26+
Example feature view:
27+
28+
```python
29+
from feast import Entity, FeatureView, Field
30+
from feast.infra.offline_stores.contrib.chronon_offline_store.chronon_source import (
31+
ChrononSource,
32+
)
33+
from feast.types import Float32
34+
35+
driver = Entity(name="driver", join_keys=["driver_id"])
36+
37+
driver_stats = FeatureView(
38+
name="driver_stats",
39+
entities=[driver],
40+
schema=[Field(name="rating", dtype=Float32)],
41+
source=ChrononSource(
42+
name="driver_stats_source",
43+
materialization_path="data/chronon/driver_stats",
44+
chronon_join="team/driver_stats.v1",
45+
timestamp_field="event_timestamp",
46+
created_timestamp_column="created_timestamp",
47+
),
48+
)
49+
```
50+
51+
## Historical retrieval
52+
53+
`get_historical_features` performs point-in-time joins against Chronon's materialized Parquet data. For each entity row, Feast selects the latest Chronon row with an event timestamp at or before the entity timestamp. If `created_timestamp_column` is configured and duplicate event timestamps exist, the latest created row wins.
54+
55+
This is intended for training and validation workflows that want Feast's registry and retrieval APIs while using Chronon as the feature computation engine.
56+
57+
## Configuration reference
58+
59+
| Parameter | Required | Default | Description |
60+
| :-------- | :------- | :------ | :---------- |
61+
| `type` | yes | - | Must be set to `chronon`. |
62+
| `path` | no | - | Reserved for future offline store configuration. Source-level `materialization_path` controls where data is read from. |
63+
64+
## Functionality Matrix
65+
66+
The set of functionality supported by offline stores is described in detail [here](overview.md#functionality).
67+
Below is a matrix indicating which functionality is supported by the Chronon offline store.
68+
69+
| | Chronon |
70+
| :----------------------------------------------------------------- | :------ |
71+
| `get_historical_features` (point-in-time correct join) | yes |
72+
| `pull_latest_from_table_or_query` (retrieve latest feature values) | yes |
73+
| `pull_all_from_table_or_query` (retrieve a saved dataset) | yes |
74+
| `offline_write_batch` (persist dataframes to offline store) | no |
75+
| `write_logged_features` (persist logged features to offline store) | no |
76+
77+
Below is a matrix indicating which functionality is supported by `ChrononRetrievalJob`.
78+
79+
| | Chronon |
80+
| ----------------------------------------------------- | ------- |
81+
| export to dataframe | yes |
82+
| export to arrow table | yes |
83+
| export to arrow batches | no |
84+
| export to SQL | no |
85+
| export to data lake (S3, GCS, etc.) | no |
86+
| export to data warehouse | no |
87+
| export as Spark dataframe | no |
88+
| local execution of Python-based on-demand transforms | no |
89+
| remote execution of Python-based on-demand transforms | no |
90+
| persist results in the offline store | no |
91+
| preview the query plan before execution | no |
92+
| read partitioned data | yes |
93+
94+
To compare this set of functionality against other offline stores, please see the full [functionality matrix](overview.md#functionality-matrix).

0 commit comments

Comments
 (0)