Skip to content

Move DataSource validation to OfflineStore #4186

@tokoko

Description

@tokoko

DataSource abstract interface contains methods like validate and get_table_column_names_and_types for validation and schema extraction respectively. This doesn't make too much sense when you consider the separation of concerns between DataSource and OfflineStore. DataSource is supposed to be a static description of source dataset, while OfflineStore is an engine that knows how to read one or more data source types. Having these methods in DataSource classes means data sources should also be able to somehow access the underlying sources.

I propose to move validate method to OfflineStore abstract class as a validate_data_source. This also makes sense for scenarios when a single source can be read by multiple offline stores. For example, FileSource (which can be read by dask, duckdb and probably spark in the future) is right now validated with pyarrow instead of leaving it up to the OfflineStores to choose how to validate the sources).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions