-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Expected Behavior
The Spark offline store should work for Python 3.7
Current Behavior
The Spark offline store does not pass integration tests on Python 3.7 (but does pass on 3.8).
Steps to reproduce
Running a specific integration test with e.g.
PYTHONPATH='.' FULL_REPO_CONFIGS_MODULE=sdk.python.feast.infra.offline_stores.contrib.contrib_repo_configuration IS_TEST=True pytest -s --integration sdk/python/tests/integration/offline_store/test_universal_historical_retrieval.py::test_historical_features_with_missing_request_data
yields _pickle.PicklingError: Could not serialize object: ValueError: Cell is empty.
The full stack trace looks like
Traceback (most recent call last):
File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/serializers.py", line 437, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 102, in dumps
cp.dump(obj)
File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 563, in dump
return Pickler.dump(self, obj)
File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 437, in dump
self.save(obj)
File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 789, in save_tuple
save(element)
File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 745, in save_function
*self._dynamic_function_reduce(obj), obj=obj
File "/Users/felixwang/feast/env/lib/python3.7/site-packages/pyspark/cloudpickle/cloudpickle_fast.py", line 682, in _save_reduce_pickle5
dictitems=dictitems, obj=obj
File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 638, in save_reduce
save(args)
File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 789, in save_tuple
save(element)
File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 774, in save_tuple
save(element)
File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/Users/felixwang/feast/env/lib/python3.7/site-packages/dill/_dill.py", line 1226, in save_cell
f = obj.cell_contents
ValueError: Cell is empty
I believe the specific issue is that a conflict between dill and cloudpickle: pyspark uses cloudpickle for its default serializer, whereas Feast uses dill in various places. See this for someone else who had a similar issue and this for more details on the conflicts between dill and cloudpickle.
For some reason, switching to Python 3.8 immediately solved this problem for me. I'm not sure if this because dill is especially brittle with Python 3.7; maybe #1971 is related to this?
Specifications
- Version: Feast 0.20.1.dev, up to commit 00ed65a77177cfe04877e9550d1c8c1e903dadf8 on master
- Platform:
- Subsystem: