-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Expected Behavior
Loading the feature_store.yaml file from within a Bytewax pod should work.
Current Behavior
yaml.safe_load() raises an error while trying to reconstruct the object below:
- pathlib.PosixPath
The error occurs while running materialization using Bytewax at the point where the feature_store.yaml is loaded. The code where this happens is in sdk/python/feast/infra/materialization/contrib/bytewax/dataflow.py. Below is an excerpt:
# ...
with open("/var/feast/feature_store.yaml") as f:
feast_config = yaml.safe_load(f) # <---- yaml.safe_load() fails
# ...The exact message is as below:
Defaulted container "process" out of: process, init-hostfile (init)
Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage
Traceback (most recent call last):
File "/bytewax/dataflow.py", line 15, in <module>
feast_config = yaml.safe_load(f)
File "/usr/local/lib/python3.9/site-packages/yaml/__init__.py", line 125, in safe_load
return load(stream, SafeLoader)
File "/usr/local/lib/python3.9/site-packages/yaml/__init__.py", line 81, in load
return loader.get_single_data()
File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 51, in get_single_data
return self.construct_document(node)
File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 60, in construct_document
for dummy in generator:
File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 413, in construct_yaml_map
value = self.construct_mapping(node)
File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 218, in construct_mapping
return super().construct_mapping(node, deep=deep)
File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 143, in construct_mapping
value = self.construct_object(value_node, deep=deep)
File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 100, in construct_object
data = constructor(self, node)
File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 427, in construct_undefined
raise ConstructorError(None, None,
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object/apply:pathlib.PosixPath'
in "/var/feast/feature_store.yaml", line 119, column 12Interestingly, method _create_configuration_map() of class BytewaxMaterializationEngine uses yaml.dump() instead of yaml.safe_dump() to write the config in the first place:
# ...
def _create_configuration_map(self, job_id, paths, feature_view, namespace):
"""Create a Kubernetes configmap for this job"""
feature_store_configuration = yaml.dump(self.repo_config.dict())
# ...When I tried to replace yaml.dump by yaml.safe_dump() I got the following error:
yaml.representer.RepresenterError: ('cannot represent an object', <RedisType.redis: 'redis'>)It appears that yaml.SafeDumper and yaml.SafeLoader cannot find the appropriate representers and/or constructors for RedisType.redis and path.PosixPath. Perhaps those objects do not have corresponding to_yaml() and from_yaml() methods.
Steps to reproduce
Run the materialization:
feast materialize --views "EXAMPLE_FEATURE_VIEW" '2023-10-30T00:00:00' '2023-10-30T23:59:59'Give it some time and check the pods:
kubectl get pods -n bytewax
NAME READY STATUS RESTARTS AGE
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-0-9kxgt 0/1 Error 0 25s
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-1-d8n4r 0/1 Error 0 25s
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-2-wmmsd 0/1 Error 0 25s
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-3-c8gn7 0/1 Error 0 25s
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-4-hgfbn 0/1 Error 0 25sThen upon inspecting the logs, I see the error from above:
kubectl logs -n bytewax dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-4-hgfbnSpecifications
- Version: feast==0.35.0, pyyaml==6.0.1, bytewax==0.15.1
- Platform:
- local: MacOS
- bytewax image: a custom build based on the Dockerfile from the feast-dev repo sdk/python/feast/infra/materialization/contrib/bytewax/Dockerfile where feast is installed with support for AWS, Bytewax, Redis (as the online store), Postgres (as the SQL registry)
Possible Solution
I was able to make it work by modifying sdk/python/feast/infra/materialization/contrib/bytewax/dataflow.py to use yaml.load() instead of yaml.safe_load() and rebuilding the Bytewax docker image:
with open("/var/feast/feature_store.yaml") as f:
#feast_config = yaml.safe_load(f)
feast_config = yaml.load(f, Loader=yaml.Loader)
with open("/var/feast/bytewax_materialization_config.yaml") as b:
# I did not test if yaml.safe_load() works for the bytewax config, but just went ahead and replaced it too
#bytewax_config = yaml.safe_load(b)
bytewax_config = yaml.load(b, Loader=yaml.Loader)