Skip to content

Feast push (Redshift/DynamoDb) not work with PushMode.ONLINE_AND_OFFLINE when more than 500 columns #3282

@beubeu13220

Description

@beubeu13220

Expected Behavior

Currently, we have a push source with Redshift Offline Store and DynamoDb Online Store.
We built our view with more than 500 columns. Around 750 columns.

We expected to ingest data in dynamo and redshift when we run
fs.push("push_source", df, to=PushMode.ONLINE_AND_OFFLINE)

Current Behavior

Push command raise an issue like [ERROR] ValueError: The input dataframe has columns ..
This issue come from get_table_column_names_and_types method in write_to_offline_store method.
In the method, we check if if set(input_columns) != set(source_columns) and raise the below issue if there are diff.

In case with more than 500 columns we get a diff because source_columns come from get_table_column_names_and_types method result where the result is define by MaxResults parameters.

Steps to reproduce

entity= Entity(
    name="entity",
    join_keys=["entity_id"],
    value_type=ValueType.INT64,
)

push_source = PushSource(
    name="push_source",
    batch_source=RedshiftSource(
        table="fs_push_view",
        timestamp_field="datecreation",
        created_timestamp_column="created_at"),
)

besoin_embedding_push_view = FeatureView(
    name="push_view",
    entities=[entity],
    schema=[Field(name=f"field_{dim}", dtype=types.Float64) for dim in range(768)],
    source=push_source 
)

fs.push("push_source", df, to=PushMode.ONLINE_AND_OFFLINE)

Specifications

  • Version: 0.25.0
  • Platform: AWS
  • Subsystem:

Possible Solution

In my mind, we have two solutions:

  • Set higher MaxResults in describe_table method
  • Use NextToken to iterate through results

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions