Skip to content

The --skip-source-validation option not working #3682

@InstructorZhang

Description

@InstructorZhang

Expected Behavior

When running feast apply --skip-source-validation, the data source validation should be skipped.

Current Behavior

When I specify --skip-source-validation with feast apply, it will call store.plan(repo) (see code here), which will invoke _make_inferences() and then _infer_features_and_entities(). The function _infer_features_and_entities() calls the function get_table_column_names_and_types(). On the other hand, what the function validate() in Spark source does is also to call get_table_column_names_and_types().

That means the validation was not skipped because the function get_table_column_names_and_types() will be called anyway, which causes the "table not found" error even if I ran feast apply --skip-source-validation.

The logs for error are as follows:

Traceback (most recent call last):
  File "/opt/homebrew/anaconda3/envs/feast/bin/feast", line 8, in <module>
    sys.exit(cli())
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/cli.py", line 519, in apply_total_command
    apply_total(repo_config, repo, skip_source_validation)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/usage.py", line 283, in wrapper
    return func(*args, **kwargs)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/repo_operations.py", line 335, in apply_total
    apply_total_with_repo_instance(
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/repo_operations.py", line 296, in apply_total_with_repo_instance
    registry_diff, infra_diff, new_infra = store.plan(repo)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/usage.py", line 283, in wrapper
    return func(*args, **kwargs)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/feature_store.py", line 724, in plan
    self._make_inferences(
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/feature_store.py", line 602, in _make_inferences
    update_feature_views_with_inferred_features_and_entities(
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/inference.py", line 168, in update_feature_views_with_inferred_features_and_entities
    _infer_features_and_entities(
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/inference.py", line 206, in _infer_features_and_entities
    table_column_names_and_types = fv.batch_source.get_table_column_names_and_types(
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast_spark/spark_source.py", line 161, in get_table_column_names_and_types
    df = spark_session.sql(f"SELECT * FROM {self.get_table_query_string()}")
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/pyspark/sql/session.py", line 1440, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
  File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/pyspark/errors/exceptions/captured.py", line 175, in deco
    raise converted from None
pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `XXXXXX` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; line 18 pos 9;

Steps to reproduce

Can only reproduce in our internal environment where there are permission controls.

Specifications

  • Version: 0.31.1
  • Platform: Linux
  • Subsystem: Ubuntu

Possible Solution

Need to re-investigate the logics/implementation of skipping source validation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions