-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
Community Contribution NeededWe want community to contributeWe want community to contributegood first issueGood for newcomersGood for newcomerskind/bugpriority/p2
Description
Expected Behavior
This script should return online features, with one null in last_purchased_date
from feast import Entity, FeatureView
from feast.infra.offline_stores.file_source import FileSource
from feast.repo_config import RegistryConfig, RepoConfig
from feast import FeatureStore
from datetime import datetime
from feast.types import Int32, UnixTimestamp
from feast import Field
import pandas as pd
# create dataset
pd.DataFrame([
{"user_id": 1, "event_timestamp": datetime(2022, 5, 1), "created": datetime(2022, 5, 1), "purchases": 3, "last_purchase_date": datetime(2022, 4, 23, 13, 4, 1)},
{"user_id": 2, "event_timestamp": datetime(2022, 5, 2), "created": datetime(2022, 5, 2), "purchases": 1, "last_purchase_date": datetime(2022, 2, 1, 11, 4, 1)},
{"user_id": 3, "event_timestamp": datetime(2022, 5, 2), "created": datetime(2022, 5, 2), "purchases": 0, "last_purchase_date": None},
]).to_parquet('user_stats.parquet')
user = Entity(name="user_id", description="user id")
user_stats_view = FeatureView(
name="user_stats",
entities=[user],
source=FileSource(
path="user_stats.parquet",
timestamp_field="event_timestamp",
created_timestamp_column="created",
),
schema=[
Field(name="purchases", dtype=Int32),
Field(name="last_purchase_date", dtype=UnixTimestamp),
]
)
online_store_path = 'online_store.db'
registry_path = 'registry.db'
repo = RepoConfig(
registry="registry.db",
project='feature_store',
provider="local",
offline_store="file",
use_ssl=True,
is_secure=True,
validate=True,
)
fs = FeatureStore(config=repo)
fs.apply([user, user_stats_view])
fs.materialize_incremental(end_date=datetime.utcnow())
entity_rows = [{"user_id": i} for i in range(1, 4)]
feature_df = fs.get_online_features(
features=[
"user_stats:purchases",
"user_stats:last_purchase_date",
],
entity_rows=entity_rows
).to_df()
print(feature_df)Current Behavior
Materializing 1 feature views to 2022-06-16 20:30:41-06:00 into the sqlite online store.
Since the ttl is 0 for feature view user_stats, the start date will be set to 1 year before the current time.
user_stats from 2021-06-17 20:30:41-06:00 to 2022-06-16 20:30:41-06:00:
100%|████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 420.36it/s]
Traceback (most recent call last):
File "null_timestamp_example.py", line 59, in <module>
feature_df = fs.get_online_features(
File "/Users/apope/feast/sdk/python/feast/online_response.py", line 79, in to_df
return pd.DataFrame(self.to_dict(include_event_timestamps))
File "/Users/apope/feast/sdk/python/feast/online_response.py", line 59, in to_dict
response[feature_ref] = [
File "/Users/apope/feast/sdk/python/feast/online_response.py", line 60, in <listcomp>
feast_value_type_to_python_type(v) for v in feature_vector.values
File "/Users/apope/feast/sdk/python/feast/type_map.py", line 74, in feast_value_type_to_python_type
val = datetime.fromtimestamp(val, tz=timezone.utc)
OSError: [Errno 84] Value too large to be stored in data type
Steps to reproduce
Specifications
- Version: 0.21.2
- Platform: Mac
- Subsystem:
Possible Solution
This happens because, while materializing, in _python_datetime_to_int_timestamp() the NaT value gets converted to -9223372036854775808
In [108]:
from typing import cast, Sequence
import numpy as np
cast(Sequence[np.int_], np.array(['nat'], dtype='datetime64[ns]').astype('datetime64[s]').astype(np.int_))
Out [108]:
array([-9223372036854775808])Which is out of range for datetime.fromtimestamp():
In [109]:
from datetime import datetime, timezone
datetime.fromtimestamp(-9223372036854775808, tz=timezone.utc)
Truncated Traceback (Use C-c C-$ to view full TB):
/tmp/ipykernel_58/1143168205.py in <module>
1 from datetime import datetime, timezone
2
----> 3 val = datetime.fromtimestamp(-9223372036854775808, tz=timezone.utc)
OSError: [Errno 75] Value too large for defined data typeA simple fix would be to leave the materialization logic as-is and, when deserializing in feast_value_type_to_python_type(), just catch this one value and return a null instead.
Metadata
Metadata
Assignees
Labels
Community Contribution NeededWe want community to contributeWe want community to contributegood first issueGood for newcomersGood for newcomerskind/bugpriority/p2