Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 145 additions & 0 deletions python_local/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Benchmarking Python Feature Server

Here we provide tools for benchmarking Python-based feature server with one online stores: Redis on a local Linux machine. Follow the instructions below to reproduce the benchmarks.

_Tested with: `feast 0.37.1`_

## Prerequisites

You need to have the following installed:
* Python `3.9+`
* Feast `0.37.0+`
* Docker
* Docker Compose `v2.x`
* Vegeta
* `parquet-tools`



## Generate Data

For all of the following benchmarks, you'll need to generate the data using `data_generator.py` under the top-level directory of this repo. Just `cd` to the main directory and run `python data_generator.py`. Please be aware that the timestamp of the generated parquet file has an experiation effect. If you try to use the generated data at a different day, it will fail the "feast materialize-increment" command in Step 3. Please generate this fake data again if no feature data is written into the Redis.

The generated parquet file includes:
1, 252 columns: "entity" column, "event_timestamp" column and 250 fake "feature_[*]" columns.
2, 10,000 rows.
3, the value of the Datafame are randomg integers.

The content of the parquet can be checked by following example commands:
1, ```parquet-tools inspect generated_data.parquet```
2, ```parquet-tools show --head 2 generated_data.parquet```



## Redis

1. Disable the USAGE feature. Apply feature definitions to create a Feast repo.

```
export FEAST_USAGE=False
cd python/feature_repos/redis
feast apply
```

2. Deploy Redis & feature servers using docker-compose

```
cd ../../docker/redis
docker-compose up -d
```
If everything goes well, you should see an output like this:

```
Creating redis_redis_1 ... done
Creating redis_feast_1 ... done
Creating redis_feast_2 ... done
Creating redis_feast_3 ... done
Creating redis_feast_4 ... done
Creating redis_feast_5 ... done
Creating redis_feast_6 ... done
Creating redis_feast_7 ... done
Creating redis_feast_8 ... done
Creating redis_feast_9 ... done
Creating redis_feast_10 ... done
Creating redis_feast_11 ... done
Creating redis_feast_12 ... done
Creating redis_feast_13 ... done
Creating redis_feast_14 ... done
Creating redis_feast_15 ... done
Creating redis_feast_16 ... done
```

3. Materialize data to Redis

```
cd ../../feature_repos/redis
# This is unfortunately necessary because inside docker feature servers resolve
# Redis host name as `redis`, but since we're running materialization from shell,
# Redis is accessible on localhost:
sed -i 's/redis:6379/localhost:6379/g' feature_store.yaml
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
# Make sure to change this back, since it can mess up with feature servers
# if you run another docker-compose command later:
sed -i 's/localhost:6379/redis:6379/g' feature_store.yaml
```

4. Check that feature servers are working & they have materialized data

```
cd ../../..
parquet-tools show --columns entity generated_data.parquet 2>/dev/null | head -n 6
```
This should return something like this:

```
+----------+
| entity |
|----------|
| 94 |
| 1992 |
| 4475 |
```

Put these numbers into an env variable with:

```
TEST_ENTITY_IDS=`parquet-tools show --columns entity generated_data.parquet 2>/dev/null | head -n 6 | tail -n 3 | sed 's/|//g' | paste -d, -s`
echo $TEST_ENTITY_IDS
```
(which should output something like `94 , 1992 , 4475 `)


Query the feature server with

```
curl -X POST \
"http://127.0.0.1:6566/get-online-features" \
-H "accept: application/json" \
-d "{
\"feature_service\": \"feature_service_0\",
\"entities\": {
\"entity\": [$TEST_ENTITY_IDS]
}
}" | jq
```


In the output, make sure that `"values"` field contains none of the null
values. It should look something like this:

```
{
"values": [
4475,
1551,
9889,
```

5. Run Benchmarks

```
cd python
./run-benchmark.sh > perf.log
```

The report (or say results) of vegeta will be written to "pert.log" file.
12 changes: 12 additions & 0 deletions python_local/docker/redis/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
FROM python:3.9

RUN pip3 install 'feast[redis]==0.37.1'
RUN pip3 install cffi

COPY feature_repos/redis feature_repo

WORKDIR feature_repo

ENV FEAST_USAGE=False

CMD feast serve --host "0.0.0.0" --port 6566
15 changes: 15 additions & 0 deletions python_local/docker/redis/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
services:
feast:
build:
context: ../..
dockerfile: docker/redis/Dockerfile
ports:
- "6566-6581:6566"
deploy:
replicas: 16
links:
- redis
redis:
image: redis
ports:
- "6379:6379"
45 changes: 45 additions & 0 deletions python_local/feature_repos/redis/example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import datetime

from feast import Entity, Field, FeatureView, FileSource, FeatureService, ValueType
from feast.types import Int64

generated_data_source = FileSource(
path="../../../generated_data.parquet",
event_timestamp_column="event_timestamp",
)

entity = Entity(
name="entity",
value_type=ValueType.INT64,
)

feature_views = [
FeatureView(
name=f"feature_view_{i}",
entities=[entity],
ttl=datetime.timedelta(days=1),
schema=[
Field(name=f"feature_{10 * i + j}", dtype=Int64)
for j in range(10)
],
online=True,
source=generated_data_source,
)
for i in range(25)
]

feature_services = [
FeatureService(
name=f"feature_service_{i}",
features=feature_views[:5*(i + 1)],
)
for i in range(5)
]

def add_definitions_in_globals():
for i, fv in enumerate(feature_views):
globals()[f"feature_view_{i}"] = fv
for i, fs in enumerate(feature_services):
globals()[f"feature_service_{i}"] = fs

add_definitions_in_globals()
9 changes: 9 additions & 0 deletions python_local/feature_repos/redis/feature_store.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
registry: data/registry.db
project: feature_repo
provider: local
online_store:
type: redis
connection_string: redis:6379
offline_store:
type: file
entity_key_serialization_version: 2
Loading