Skip to content

Spark offline store does not work for Python 3.7 #2608

@felixwang9817

Description

@felixwang9817

Expected Behavior

The Spark offline store should work for Python 3.7

Current Behavior

The Spark offline store does not pass integration tests on Python 3.7 (but does pass on 3.8).

Steps to reproduce

Running a specific integration test with e.g.

PYTHONPATH='.' FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.offline_stores.contrib.contrib_repo_configuration IS_TEST=True pytest -s --integration sdk/python/tests/integration/offline_store/test_universal_historical_retrieval.py::test_historical_features_with_missing_request_data

yields _pickle.PicklingError: Could not serialize object: ValueError: Cell is empty.

The full stack trace looks like

Traceback (most recent call last):
  File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/serializers.py", line 437, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
  File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 102, in dumps
    cp.dump(obj)
  File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 789, in save_tuple
    save(element)
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 745, in save_function
    *self._dynamic_function_reduce(obj), obj=obj
  File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 682, in _save_reduce_pickle5
    dictitems=dictitems, obj=obj
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 638, in save_reduce
    save(args)
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 789, in save_tuple
    save(element)
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 774, in save_tuple
    save(element)
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/Users/felixwang/feast/env/lib/python3.7/site-packages/dill/_dill.py", line 1226, in save_cell
    f = obj.cell_contents
ValueError: Cell is empty

I believe the specific issue is that a conflict between dill and cloudpickle: pyspark uses cloudpickle for its default serializer, whereas Feast uses dill in various places. See this for someone else who had a similar issue and this for more details on the conflicts between dill and cloudpickle.

For some reason, switching to Python 3.8 immediately solved this problem for me. I'm not sure if this because dill is especially brittle with Python 3.7; maybe #1971 is related to this?

Specifications

Possible Solution

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions