-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
Description
Expected Behavior
Registry protobuf should not have duplicate data source definitions.
Current Behavior
Every time we run feast apply, we noticed that a data source source is appended to the registry protobuf with the exact same details.
from feast.protos.feast.core.Registry_pb2 import Registry as RegistryProto
registry_proto = RegistryProto()
with open('registry.db', 'rb') as f: data = f.read()
print(registry_proto.FromString(data))This is what the protobuf looks like after running feast apply 3 times in a row without making any changes to the definition.
entities {
spec {
name: "__dummy"
value_type: STRING
join_key: "__dummy_id"
project: "myproject"
}
meta {
created_timestamp {
seconds: 1650485581
nanos: 37674000
}
last_updated_timestamp {
seconds: 1650485581
nanos: 37723000
}
}
}
registry_schema_version: "1"
version_id: "743b0630-373b-405f-861d-3de738962d7c"
last_updated {
seconds: 1650485581
nanos: 37831000
}
feature_views {
spec {
name: "IRIS"
project: "myproject"
entities: "__dummy"
features {
name: "PETAL_LENGTH"
value_type: FLOAT
}
features {
name: "PETAL_WIDTH"
value_type: FLOAT
}
features {
name: "SEPAL_LENGTH"
value_type: FLOAT
}
features {
name: "SEPAL_WIDTH"
value_type: FLOAT
}
features {
name: "SPECIES"
value_type: INT64
}
ttl {
seconds: 31449600
}
batch_source {
type: BATCH_SNOWFLAKE
timestamp_field: "EVENT_TIMESTAMP"
created_timestamp_column: "CREATE_TIMESTAMP"
data_source_class_type: "feast.infra.offline_stores.snowflake_source.SnowflakeSource"
snowflake_options {
table: "IRIS"
schema: "MY_SCHEMA"
database: "MY_DATABASE"
}
}
online: true
description: "A sample feature view containing the Iris dataset."
}
meta {
created_timestamp {
seconds: 1650485581
nanos: 37333000
}
last_updated_timestamp {
seconds: 1650485581
nanos: 37333000
}
}
}
data_sources {
type: BATCH_SNOWFLAKE
timestamp_field: "EVENT_TIMESTAMP"
created_timestamp_column: "CREATE_TIMESTAMP"
data_source_class_type: "feast.infra.offline_stores.snowflake_source.SnowflakeSource"
snowflake_options {
table: "IRIS"
schema: "MY_SCHEMA"
database: "MY_DATABASE"
}
project: "myproject"
}
data_sources {
type: BATCH_SNOWFLAKE
timestamp_field: "EVENT_TIMESTAMP"
created_timestamp_column: "CREATE_TIMESTAMP"
data_source_class_type: "feast.infra.offline_stores.snowflake_source.SnowflakeSource"
snowflake_options {
table: "IRIS"
schema: "MY_SCHEMA"
database: "MY_DATABASE"
}
project: "myproject"
}
data_sources {
type: BATCH_SNOWFLAKE
timestamp_field: "EVENT_TIMESTAMP"
created_timestamp_column: "CREATE_TIMESTAMP"
data_source_class_type: "feast.infra.offline_stores.snowflake_source.SnowflakeSource"
snowflake_options {
table: "IRIS"
schema: "MY_SCHEMA"
database: "MY_DATABASE"
}
project: "myproject"
}
Steps to reproduce
Here's the feature_store.yaml:
project: myproject
registry: s3://mybucket/registry.db
provider: local
offline_store:
type: snowflake.offline
account: myaccount
user: myuser
password: mypasssword
role: myrole
warehouse: mywarehouse
database: mydatabase
online_store:
type: redis
connection_string: myredisconnectionstringAnd here's the feature definition feature_definitions/iris.py:
from datetime import timedelta
from pathlib import Path
from feast import Feature, FeatureView, SnowflakeSource, ValueType
name = Path(__file__).stem
iris_source = SnowflakeSource(
schema="MY_SCHEMA",
database="MY_DATABASE",
table="IRIS",
timestamp_field="EVENT_TIMESTAMP",
created_timestamp_column="CREATE_TIMESTAMP"
)
iris = FeatureView(
name=name.upper(),
entities=[],
ttl=timedelta(weeks=52),
schema=[
Feature(name="PETAL_LENGTHS", dtype=ValueType.FLOAT),
Feature(name="PETAL_WIDTH", dtype=ValueType.FLOAT),
Feature(name="SEPAL_LENGTH", dtype=ValueType.FLOAT),
Feature(name="SEPAL_WIDTH", dtype=ValueType.FLOAT),
Feature(name="SPECIES", dtype=ValueType.INT64),
],
online=True,
source=iris_source,
description="A sample feature view containing the Iris dataset."
)Specifications
- Version:
0.2.0 - Platform: AWS EC2 (Linux2 AMI)
- Subsystem: