Skip to content

Bytewax materialization engine fails when loading feature_store.yaml #3893

@gterziysky

Description

@gterziysky

Expected Behavior

Loading the feature_store.yaml file from within a Bytewax pod should work.

Current Behavior

yaml.safe_load() raises an error while trying to reconstruct the object below:

  • pathlib.PosixPath

The error occurs while running materialization using Bytewax at the point where the feature_store.yaml is loaded. The code where this happens is in sdk/python/feast/infra/materialization/contrib/bytewax/dataflow.py. Below is an excerpt:

# ...
 with open("/var/feast/feature_store.yaml") as f:
        feast_config = yaml.safe_load(f) # <---- yaml.safe_load() fails
# ...

The exact message is as below:

Defaulted container "process" out of: process, init-hostfile (init)
Feast is an open source project that collects anonymized error reporting and usage statistics. To opt out or learn more see https://docs.feast.dev/reference/usage
Traceback (most recent call last):
  File "/bytewax/dataflow.py", line 15, in <module>
    feast_config = yaml.safe_load(f)
  File "/usr/local/lib/python3.9/site-packages/yaml/__init__.py", line 125, in safe_load
    return load(stream, SafeLoader)
  File "/usr/local/lib/python3.9/site-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 51, in get_single_data
    return self.construct_document(node)
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 60, in construct_document
    for dummy in generator:
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 413, in construct_yaml_map
    value = self.construct_mapping(node)
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 218, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 143, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 100, in construct_object
    data = constructor(self, node)
  File "/usr/local/lib/python3.9/site-packages/yaml/constructor.py", line 427, in construct_undefined
    raise ConstructorError(None, None,
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object/apply:pathlib.PosixPath'
  in "/var/feast/feature_store.yaml", line 119, column 12

Interestingly, method _create_configuration_map() of class BytewaxMaterializationEngine uses yaml.dump() instead of yaml.safe_dump() to write the config in the first place:

    # ...
    def _create_configuration_map(self, job_id, paths, feature_view, namespace):
        """Create a Kubernetes configmap for this job"""

        feature_store_configuration = yaml.dump(self.repo_config.dict())
    # ...

When I tried to replace yaml.dump by yaml.safe_dump() I got the following error:

yaml.representer.RepresenterError: ('cannot represent an object', <RedisType.redis: 'redis'>)

It appears that yaml.SafeDumper and yaml.SafeLoader cannot find the appropriate representers and/or constructors for RedisType.redis and path.PosixPath. Perhaps those objects do not have corresponding to_yaml() and from_yaml() methods.

Steps to reproduce

Run the materialization:

feast materialize  --views "EXAMPLE_FEATURE_VIEW" '2023-10-30T00:00:00' '2023-10-30T23:59:59'

Give it some time and check the pods:

kubectl get pods -n bytewax
NAME                                                    READY   STATUS   RESTARTS   AGE
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-0-9kxgt   0/1     Error    0          25s
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-1-d8n4r   0/1     Error    0          25s
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-2-wmmsd   0/1     Error    0          25s
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-3-c8gn7   0/1     Error    0          25s
dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-4-hgfbn   0/1     Error    0          25s

Then upon inspecting the logs, I see the error from above:

kubectl logs -n bytewax dataflow-4f3a7567-7cc9-4188-9fb1-cfc614451c35-4-hgfbn

Specifications

Possible Solution

I was able to make it work by modifying sdk/python/feast/infra/materialization/contrib/bytewax/dataflow.py to use yaml.load() instead of yaml.safe_load() and rebuilding the Bytewax docker image:

    with open("/var/feast/feature_store.yaml") as f:
        #feast_config = yaml.safe_load(f)
        feast_config = yaml.load(f, Loader=yaml.Loader)

        with open("/var/feast/bytewax_materialization_config.yaml") as b:
            # I did not test if yaml.safe_load() works for the bytewax config, but just went ahead and replaced it too 
            #bytewax_config = yaml.safe_load(b)
            bytewax_config = yaml.load(b, Loader=yaml.Loader)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions