Skip to content

No support for null UnixTimestamp #2803

@aapope

Description

@aapope

Expected Behavior

This script should return online features, with one null in last_purchased_date

from feast import Entity, FeatureView
from feast.infra.offline_stores.file_source import FileSource
from feast.repo_config import RegistryConfig, RepoConfig
from feast import FeatureStore
from datetime import datetime
from feast.types import Int32, UnixTimestamp
from feast import Field

import pandas as pd


# create dataset
pd.DataFrame([
    {"user_id": 1, "event_timestamp": datetime(2022, 5, 1), "created": datetime(2022, 5, 1), "purchases": 3, "last_purchase_date": datetime(2022, 4, 23, 13, 4, 1)},
    {"user_id": 2, "event_timestamp": datetime(2022, 5, 2), "created": datetime(2022, 5, 2), "purchases": 1, "last_purchase_date": datetime(2022, 2, 1, 11, 4, 1)},
    {"user_id": 3, "event_timestamp": datetime(2022, 5, 2), "created": datetime(2022, 5, 2), "purchases": 0, "last_purchase_date": None},    
]).to_parquet('user_stats.parquet')


user = Entity(name="user_id", description="user id")

user_stats_view = FeatureView(
    name="user_stats",
    entities=[user],
    source=FileSource(
        path="user_stats.parquet",
        timestamp_field="event_timestamp",
        created_timestamp_column="created",
    ),
    schema=[
        Field(name="purchases", dtype=Int32),
        Field(name="last_purchase_date", dtype=UnixTimestamp),
    ]
)

online_store_path = 'online_store.db'
registry_path = 'registry.db'

repo = RepoConfig(
    registry="registry.db",
    project='feature_store',
    provider="local",
    offline_store="file",
    use_ssl=True, 
    is_secure=True,
    validate=True,
)

fs = FeatureStore(config=repo)

fs.apply([user, user_stats_view])

fs.materialize_incremental(end_date=datetime.utcnow())


entity_rows = [{"user_id": i} for i in range(1, 4)]


feature_df = fs.get_online_features(
    features=[
        "user_stats:purchases",
        "user_stats:last_purchase_date",
    ],
    entity_rows=entity_rows
).to_df()
print(feature_df)

Current Behavior

Materializing 1 feature views to 2022-06-16 20:30:41-06:00 into the sqlite online store.

Since the ttl is 0 for feature view user_stats, the start date will be set to 1 year before the current time.
user_stats from 2021-06-17 20:30:41-06:00 to 2022-06-16 20:30:41-06:00:
100%|████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 420.36it/s]
Traceback (most recent call last):
  File "null_timestamp_example.py", line 59, in <module>
    feature_df = fs.get_online_features(
  File "/Users/apope/feast/sdk/python/feast/online_response.py", line 79, in to_df
    return pd.DataFrame(self.to_dict(include_event_timestamps))
  File "/Users/apope/feast/sdk/python/feast/online_response.py", line 59, in to_dict
    response[feature_ref] = [
  File "/Users/apope/feast/sdk/python/feast/online_response.py", line 60, in <listcomp>
    feast_value_type_to_python_type(v) for v in feature_vector.values
  File "/Users/apope/feast/sdk/python/feast/type_map.py", line 74, in feast_value_type_to_python_type
    val = datetime.fromtimestamp(val, tz=timezone.utc)
OSError: [Errno 84] Value too large to be stored in data type

Steps to reproduce

Specifications

  • Version: 0.21.2
  • Platform: Mac
  • Subsystem:

Possible Solution

This happens because, while materializing, in _python_datetime_to_int_timestamp() the NaT value gets converted to -9223372036854775808

In [108]:
from typing import cast, Sequence
import numpy as np

cast(Sequence[np.int_], np.array(['nat'], dtype='datetime64[ns]').astype('datetime64[s]').astype(np.int_))
Out [108]:
array([-9223372036854775808])

Which is out of range for datetime.fromtimestamp():

In [109]:
from datetime import datetime, timezone

datetime.fromtimestamp(-9223372036854775808, tz=timezone.utc)

Truncated Traceback (Use C-c C-$ to view full TB):
/tmp/ipykernel_58/1143168205.py in <module>
      1 from datetime import datetime, timezone
      2 
----> 3 val = datetime.fromtimestamp(-9223372036854775808, tz=timezone.utc)

OSError: [Errno 75] Value too large for defined data type

A simple fix would be to leave the materialization logic as-is and, when deserializing in feast_value_type_to_python_type(), just catch this one value and return a null instead.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions