9

I am exploring statsmodels.imputation.mice package to use for imputing missing values. I haven't seen any example of its usage, though, outside of http://www.statsmodels.org. From what I gather, one would create an instance of mice.MICEData and use it in conjunction with mice.MICE().fit(). Example from http://www.statsmodels.org/dev/generated/statsmodels.imputation.mice.MICE.html

>>> imp = mice.MICEData(data)
>>> fml = 'y ~ x1 + x2 + x3 + x4'
>>> mice = mice.MICE(fml, sm.OLS, imp)
>>> results = mice.fit(10, 10)
>>> print(results.summary())

The imputed values in an instance of MiceData are not fixed, though. What I mean is that if

imp = mice.MICEData(data)

Every call

imp.update('x1') 

(assuming data has a column 'x1') draws a new sample for the missing values using “predictive mean matching”. That's all good if I use MICEDdata with MICE.fit(). However, let's say I want to use this package to impute the value values once, and then use a predictor from another package, say from sklearn, to fit the data. I wonder, what would be a reasonable approach. I can run update several times and average the prediction for each missing value. Alternatively, I can create several data sets with different imputed values and fit each of those sets. However, if my data set is huge, it can get pretty expensive.

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.