-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Expected Behavior
When running feast apply --skip-source-validation, the data source validation should be skipped.
Current Behavior
When I specify --skip-source-validation with feast apply, it will call store.plan(repo) (see code here), which will invoke _make_inferences() and then _infer_features_and_entities(). The function _infer_features_and_entities() calls the function get_table_column_names_and_types(). On the other hand, what the function validate() in Spark source does is also to call get_table_column_names_and_types().
That means the validation was not skipped because the function get_table_column_names_and_types() will be called anyway, which causes the "table not found" error even if I ran feast apply --skip-source-validation.
The logs for error are as follows:
Traceback (most recent call last):
File "/opt/homebrew/anaconda3/envs/feast/bin/feast", line 8, in <module>
sys.exit(cli())
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/cli.py", line 519, in apply_total_command
apply_total(repo_config, repo, skip_source_validation)
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/usage.py", line 283, in wrapper
return func(*args, **kwargs)
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/repo_operations.py", line 335, in apply_total
apply_total_with_repo_instance(
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/repo_operations.py", line 296, in apply_total_with_repo_instance
registry_diff, infra_diff, new_infra = store.plan(repo)
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/usage.py", line 283, in wrapper
return func(*args, **kwargs)
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/feature_store.py", line 724, in plan
self._make_inferences(
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/feature_store.py", line 602, in _make_inferences
update_feature_views_with_inferred_features_and_entities(
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/inference.py", line 168, in update_feature_views_with_inferred_features_and_entities
_infer_features_and_entities(
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast/inference.py", line 206, in _infer_features_and_entities
table_column_names_and_types = fv.batch_source.get_table_column_names_and_types(
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/feast_spark/spark_source.py", line 161, in get_table_column_names_and_types
df = spark_session.sql(f"SELECT * FROM {self.get_table_query_string()}")
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/pyspark/sql/session.py", line 1440, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
File "/opt/homebrew/anaconda3/envs/feast/lib/python3.8/site-packages/pyspark/errors/exceptions/captured.py", line 175, in deco
raise converted from None
pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `XXXXXX` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; line 18 pos 9;
Steps to reproduce
Can only reproduce in our internal environment where there are permission controls.
Specifications
- Version: 0.31.1
- Platform: Linux
- Subsystem: Ubuntu
Possible Solution
Need to re-investigate the logics/implementation of skipping source validation.