Skip to content

Duplicate data source in feature registry #2581

@tkch3n

Description

@tkch3n

Expected Behavior

Registry protobuf should not have duplicate data source definitions.

Current Behavior

Every time we run feast apply, we noticed that a data source source is appended to the registry protobuf with the exact same details.

from feast.protos.feast.core.Registry_pb2 import Registry as RegistryProto

registry_proto = RegistryProto()

with open('registry.db', 'rb') as f: data = f.read()

print(registry_proto.FromString(data))

This is what the protobuf looks like after running feast apply 3 times in a row without making any changes to the definition.

entities {
  spec {
    name: "__dummy"
    value_type: STRING
    join_key: "__dummy_id"
    project: "myproject"
  }
  meta {
    created_timestamp {
      seconds: 1650485581
      nanos: 37674000
    }
    last_updated_timestamp {
      seconds: 1650485581
      nanos: 37723000
    }
  }
}
registry_schema_version: "1"
version_id: "743b0630-373b-405f-861d-3de738962d7c"
last_updated {
  seconds: 1650485581
  nanos: 37831000
}
feature_views {
  spec {
    name: "IRIS"
    project: "myproject"
    entities: "__dummy"
    features {
      name: "PETAL_LENGTH"
      value_type: FLOAT
    }
    features {
      name: "PETAL_WIDTH"
      value_type: FLOAT
    }
    features {
      name: "SEPAL_LENGTH"
      value_type: FLOAT
    }
    features {
      name: "SEPAL_WIDTH"
      value_type: FLOAT
    }
    features {
      name: "SPECIES"
      value_type: INT64
    }
    ttl {
      seconds: 31449600
    }
    batch_source {
      type: BATCH_SNOWFLAKE
      timestamp_field: "EVENT_TIMESTAMP"
      created_timestamp_column: "CREATE_TIMESTAMP"
      data_source_class_type: "feast.infra.offline_stores.snowflake_source.SnowflakeSource"
      snowflake_options {
        table: "IRIS"
        schema: "MY_SCHEMA"
        database: "MY_DATABASE"
      }
    }
    online: true
    description: "A sample feature view containing the Iris dataset."
  }
  meta {
    created_timestamp {
      seconds: 1650485581
      nanos: 37333000
    }
    last_updated_timestamp {
      seconds: 1650485581
      nanos: 37333000
    }
  }
}
data_sources {
  type: BATCH_SNOWFLAKE
  timestamp_field: "EVENT_TIMESTAMP"
  created_timestamp_column: "CREATE_TIMESTAMP"
  data_source_class_type: "feast.infra.offline_stores.snowflake_source.SnowflakeSource"
  snowflake_options {
    table: "IRIS"
    schema: "MY_SCHEMA"
    database: "MY_DATABASE"
  }
  project: "myproject"
}
data_sources {
  type: BATCH_SNOWFLAKE
  timestamp_field: "EVENT_TIMESTAMP"
  created_timestamp_column: "CREATE_TIMESTAMP"
  data_source_class_type: "feast.infra.offline_stores.snowflake_source.SnowflakeSource"
  snowflake_options {
    table: "IRIS"
    schema: "MY_SCHEMA"
    database: "MY_DATABASE"
  }
  project: "myproject"
}
data_sources {
  type: BATCH_SNOWFLAKE
  timestamp_field: "EVENT_TIMESTAMP"
  created_timestamp_column: "CREATE_TIMESTAMP"
  data_source_class_type: "feast.infra.offline_stores.snowflake_source.SnowflakeSource"
  snowflake_options {
    table: "IRIS"
    schema: "MY_SCHEMA"
    database: "MY_DATABASE"
  }
  project: "myproject"
}

Steps to reproduce

Here's the feature_store.yaml:

project: myproject
registry: s3://mybucket/registry.db
provider: local
offline_store:
  type: snowflake.offline
  account: myaccount
  user: myuser
  password: mypasssword
  role: myrole
  warehouse: mywarehouse
  database: mydatabase
online_store:
  type: redis
  connection_string: myredisconnectionstring

And here's the feature definition feature_definitions/iris.py:

from datetime import timedelta
from pathlib import Path

from feast import Feature, FeatureView, SnowflakeSource, ValueType

name = Path(__file__).stem

iris_source = SnowflakeSource(
    schema="MY_SCHEMA",
    database="MY_DATABASE",
    table="IRIS",
    timestamp_field="EVENT_TIMESTAMP",
    created_timestamp_column="CREATE_TIMESTAMP"
)

iris = FeatureView(
    name=name.upper(),
    entities=[],
    ttl=timedelta(weeks=52),
    schema=[
        Feature(name="PETAL_LENGTHS", dtype=ValueType.FLOAT),
        Feature(name="PETAL_WIDTH", dtype=ValueType.FLOAT),
        Feature(name="SEPAL_LENGTH", dtype=ValueType.FLOAT),
        Feature(name="SEPAL_WIDTH", dtype=ValueType.FLOAT),
        Feature(name="SPECIES", dtype=ValueType.INT64),
    ],
    online=True,
    source=iris_source,
    description="A sample feature view containing the Iris dataset."
)

Specifications

  • Version: 0.2.0
  • Platform: AWS EC2 (Linux2 AMI)
  • Subsystem:

Possible Solution

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions