Skip to content

Commit ebfba52

Browse files
committed
Add module 2 notebook
Signed-off-by: Danny Chiao <danny@tecton.ai>
1 parent cad91c0 commit ebfba52

File tree

2 files changed

+260
-0
lines changed

2 files changed

+260
-0
lines changed
Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Retrieving on demand features"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"## 1. Instantiate a `FeatureStore` object"
15+
]
16+
},
17+
{
18+
"cell_type": "code",
19+
"execution_count": 1,
20+
"metadata": {},
21+
"outputs": [],
22+
"source": [
23+
"from feast import FeatureStore\n",
24+
"import pandas as pd\n",
25+
"from datetime import datetime"
26+
]
27+
},
28+
{
29+
"cell_type": "code",
30+
"execution_count": 2,
31+
"metadata": {},
32+
"outputs": [],
33+
"source": [
34+
"store = FeatureStore(repo_path=\".\")"
35+
]
36+
},
37+
{
38+
"cell_type": "markdown",
39+
"metadata": {},
40+
"source": [
41+
"# 2. Retrieve historical features"
42+
]
43+
},
44+
{
45+
"cell_type": "markdown",
46+
"metadata": {},
47+
"source": [
48+
"### model_v2 feature service\n",
49+
"This one leverages dummy `val_to_add` and `val_to_add_2` request data "
50+
]
51+
},
52+
{
53+
"cell_type": "code",
54+
"execution_count": 4,
55+
"metadata": {},
56+
"outputs": [
57+
{
58+
"name": "stdout",
59+
"output_type": "stream",
60+
"text": [
61+
" driver_id event_timestamp val_to_add val_to_add_2 \\\n",
62+
"360 1001 2021-04-12 10:59:42+00:00 1 10 \n",
63+
"721 1002 2021-04-12 08:12:10+00:00 2 20 \n",
64+
"1084 1003 2021-04-12 16:40:26+00:00 3 30 \n",
65+
"1445 1004 2021-04-12 15:01:12+00:00 4 40 \n",
66+
"\n",
67+
" conv_rate conv_rate_plus_val1 conv_rate_plus_val2 \n",
68+
"360 0.521149 1.521149 10.521149 \n",
69+
"721 0.089014 2.089014 20.089014 \n",
70+
"1084 0.188855 3.188855 30.188855 \n",
71+
"1445 0.296492 4.296492 40.296492 \n"
72+
]
73+
}
74+
],
75+
"source": [
76+
"entity_df = pd.DataFrame.from_dict(\n",
77+
" {\n",
78+
" \"driver_id\": [1001, 1002, 1003, 1004],\n",
79+
" \"event_timestamp\": [\n",
80+
" datetime(2021, 4, 12, 10, 59, 42),\n",
81+
" datetime(2021, 4, 12, 8, 12, 10),\n",
82+
" datetime(2021, 4, 12, 16, 40, 26),\n",
83+
" datetime(2021, 4, 12, 15, 1, 12),\n",
84+
" ],\n",
85+
" \"val_to_add\": [1, 2, 3, 4],\n",
86+
" \"val_to_add_2\": [10, 20, 30, 40],\n",
87+
" }\n",
88+
")\n",
89+
"training_df = store.get_historical_features(\n",
90+
" entity_df=entity_df,\n",
91+
" features=store.get_feature_service(\"model_v2\"),\n",
92+
").to_df()\n",
93+
"print(training_df.head())"
94+
]
95+
},
96+
{
97+
"cell_type": "markdown",
98+
"metadata": {},
99+
"source": [
100+
"### model_v3 feature service\n",
101+
"This one generates geohash features"
102+
]
103+
},
104+
{
105+
"cell_type": "code",
106+
"execution_count": 6,
107+
"metadata": {},
108+
"outputs": [
109+
{
110+
"name": "stdout",
111+
"output_type": "stream",
112+
"text": [
113+
" driver_id event_timestamp daily_miles_driven lat \\\n",
114+
"360 1001 2021-04-12 10:59:42+00:00 18.926695 1.265647 \n",
115+
"721 1002 2021-04-12 08:12:10+00:00 12.005569 0.722192 \n",
116+
"1084 1003 2021-04-12 16:40:26+00:00 23.490234 1.330712 \n",
117+
"1445 1004 2021-04-12 15:01:12+00:00 19.204191 0.961260 \n",
118+
"\n",
119+
" lon geohash geohash_1 geohash_2 geohash_3 geohash_4 \\\n",
120+
"360 1.150815 s00z4nmuzvtv s s0 s00 s00z \n",
121+
"721 0.290492 s00hne7x0fqj s s0 s00 s00h \n",
122+
"1084 2.996348 s04ps4jzgyxq s s0 s04 s04p \n",
123+
"1445 5.048517 s05t6yupwzyu s s0 s05 s05t \n",
124+
"\n",
125+
" geohash_5 geohash_6 \n",
126+
"360 s00z4 s00z4n \n",
127+
"721 s00hn s00hne \n",
128+
"1084 s04ps s04ps4 \n",
129+
"1445 s05t6 s05t6y \n"
130+
]
131+
}
132+
],
133+
"source": [
134+
"entity_df = pd.DataFrame.from_dict(\n",
135+
" {\n",
136+
" \"driver_id\": [1001, 1002, 1003, 1004],\n",
137+
" \"event_timestamp\": [\n",
138+
" datetime(2021, 4, 12, 10, 59, 42),\n",
139+
" datetime(2021, 4, 12, 8, 12, 10),\n",
140+
" datetime(2021, 4, 12, 16, 40, 26),\n",
141+
" datetime(2021, 4, 12, 15, 1, 12),\n",
142+
" ]\n",
143+
" }\n",
144+
")\n",
145+
"\n",
146+
"training_df = store.get_historical_features(\n",
147+
" entity_df=entity_df,\n",
148+
" features=store.get_feature_service(\"model_v3\"),\n",
149+
").to_df()\n",
150+
"print(training_df.head())"
151+
]
152+
},
153+
{
154+
"cell_type": "markdown",
155+
"metadata": {},
156+
"source": [
157+
"# 3. Retrieve online features"
158+
]
159+
},
160+
{
161+
"cell_type": "markdown",
162+
"metadata": {},
163+
"source": [
164+
"### model_v2 feature service\n",
165+
"This one leverages dummy `val_to_add` and `val_to_add_2` request data so this is passed into the `entity_rows` parameter"
166+
]
167+
},
168+
{
169+
"cell_type": "code",
170+
"execution_count": 9,
171+
"metadata": {},
172+
"outputs": [
173+
{
174+
"name": "stdout",
175+
"output_type": "stream",
176+
"text": [
177+
"conv_rate : [0.4045884609222412]\n",
178+
"conv_rate_plus_val1 : [1000.4045884609222]\n",
179+
"conv_rate_plus_val2 : [2000.4045884609222]\n",
180+
"driver_id : [1001]\n"
181+
]
182+
}
183+
],
184+
"source": [
185+
"features = store.get_online_features(\n",
186+
" features=store.get_feature_service(\"model_v2\"),\n",
187+
" entity_rows=[{\"driver_id\": 1001, \"val_to_add\": 1000, \"val_to_add_2\": 2000,}],\n",
188+
").to_dict()\n",
189+
"for key, value in sorted(features.items()):\n",
190+
" print(key, \" : \", value)"
191+
]
192+
},
193+
{
194+
"cell_type": "markdown",
195+
"metadata": {},
196+
"source": [
197+
"### model_v3 feature service\n",
198+
"This one generates geohash features from latitude and longitude values in the online store.\n",
199+
"\n",
200+
"Note that this feature service relies on a `PushSource` so no lat / lon values are needed at request time. Perhaps there's a separate thread on the driver's app that asynchronously pushes the driver's location to a Kafka topic."
201+
]
202+
},
203+
{
204+
"cell_type": "code",
205+
"execution_count": 11,
206+
"metadata": {},
207+
"outputs": [
208+
{
209+
"name": "stdout",
210+
"output_type": "stream",
211+
"text": [
212+
"daily_miles_driven : [350.6502685546875]\n",
213+
"driver_id : [1001]\n",
214+
"geohash_1 : ['s']\n",
215+
"geohash_2 : ['s0']\n",
216+
"geohash_3 : ['s07']\n",
217+
"geohash_4 : ['s07z']\n",
218+
"geohash_5 : ['s07z6']\n",
219+
"geohash_6 : ['s07z6m']\n",
220+
"lat : [2.71002197265625]\n",
221+
"lon : [5.3769989013671875]\n"
222+
]
223+
}
224+
],
225+
"source": [
226+
"features = store.get_online_features(\n",
227+
" features=store.get_feature_service(\"model_v3\"),\n",
228+
" entity_rows=[{\"driver_id\": 1001}],\n",
229+
").to_dict()\n",
230+
"for key, value in sorted(features.items()):\n",
231+
" print(key, \" : \", value)"
232+
]
233+
}
234+
],
235+
"metadata": {
236+
"interpreter": {
237+
"hash": "7d634b9af180bcb32a446a43848522733ff8f5bbf0cc46dba1a83bede04bf237"
238+
},
239+
"kernelspec": {
240+
"display_name": "Python 3.8.10 64-bit ('python-3.8')",
241+
"language": "python",
242+
"name": "python3"
243+
},
244+
"language_info": {
245+
"codemirror_mode": {
246+
"name": "ipython",
247+
"version": 3
248+
},
249+
"file_extension": ".py",
250+
"mimetype": "text/x-python",
251+
"name": "python",
252+
"nbconvert_exporter": "python",
253+
"pygments_lexer": "ipython3",
254+
"version": "3.8.10"
255+
},
256+
"orig_nbformat": 4
257+
},
258+
"nbformat": 4,
259+
"nbformat_minor": 2
260+
}

module_2/data.png

140 KB
Loading

0 commit comments

Comments
 (0)