Hi, I tried training a ExplainableBoostingRegressor using Dask arrays, but I keep running into the following issue:
ERROR:interpret.utils.all:Could not unify data of type: <class 'dask.array.core.Array'>
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-53-b5e1d33190e1> in <module>
1 model = create_model()
2
----> 3 fold_models, train_preds, valid_preds = model_cv(model)
<ipython-input-48-178d0374bff6> in model_cv(model)
17
---> 18 fold_model = sklearn.clone(model).fit(x_train_fold, y_train_fold)
19
/opt/anaconda/envs/ebm/lib/python3.8/site-packages/interpret/glassbox/ebm/ebm.py in fit(self, X, y)
744 # TODO: PK don't overwrite self.feature_names here (scikit-learn rules), and it's also confusing to
745 # user to have their fields overwritten. Use feature_names_out_ or something similar
--> 746 X, y, self.feature_names, _ = unify_data(
747 X, y, self.feature_names, self.feature_types, missing_data_allowed=False
748 )
/opt/anaconda/envs/ebm/lib/python3.8/site-packages/interpret/utils/all.py in unify_data(data, labels, feature_names, feature_types, missing_data_allowed)
325 msg = "Could not unify data of type: {0}".format(type(data))
326 log.error(msg)
--> 327 raise ValueError(msg)
328
329 new_labels = unify_vector(labels)
ValueError: Could not unify data of type: <class 'dask.array.core.Array'>
Each of my folds is a 2D array consisting of 56 features and occupying ~16GB of memory.
Passing model.fit(X.compute(), y.compute() crashes memory after some time, probably because of Joblib copying data around unnecessarily.
Hi, I tried training a ExplainableBoostingRegressor using Dask arrays, but I keep running into the following issue:
Each of my folds is a 2D array consisting of 56 features and occupying ~16GB of memory.
Passing
model.fit(X.compute(), y.compute()crashes memory after some time, probably because of Joblib copying data around unnecessarily.