1

I have a huge data (4M x 17) that has missing values. Two columns are categorical, rest all are numerical. I want to use MICE package for missing value imputation. This is what I tried:

> testMice <- mice(myData[1:100000,]) # runs fine  
> testTot <- predict(testMice, myData)
Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "mids"

Running the imputation on whole dataset was computationally expensive, so I ran it on only the first 100K observations. Then I am trying to use the output to impute the whole data.

Is there anything wrong with my approach? If yes, what should I do to make it correct? If no, then why am I getting this error?

2 Answers 2

2

Neither mice nor hmisc provide the parameter estimates from the imputation process. Both Amelia and imputeMulti do. In both cases, you can extract the parameter estimates and use them for imputing your other observations.

  • Amelia assumes your data are distributed as a multivariate normal (eg. X \sim N(\mu, \Sigma).
  • imputeMulti assumes that your data is distributed as a multivariate multinomial distribution. That is the complete cell counts are distributed (X \sim M(n,\theta)) where n is the number of observations.

Fitting can be done as follows, via example data. Examining parameter estimates is shown further below.

library(Amelia)
library(imputeMulti)
data(tract2221, package= "imputeMulti")
test_dat2 <- tract2221[, c("gender", "marital_status","edu_attain", "emp_status")]
# fitting
IM_EM <- multinomial_impute(test_dat2, "EM",conj_prior = "non.informative", verbose= TRUE)
amelia_EM <- amelia(test_dat2, m= 1, noms= c("gender", "marital_status","edu_attain", "emp_status"))
  • The parameter estimates from the amelia function are found in amelia_EM$mu and amelia_EM$theta.
  • The parameter estimates in imputeMulti are found in IM_EM@mle_x_y and can be accessed via the get_parameters method.

imputeMulti has noticeably higher imputation accuracy for categorical data relative to either of the other 3 packages, though it only accepts multinomial (eg. factor) data.

All of this information is in the currently unpublished vignette for imputeMulti. The paper has been submitted to JSS and I am awaiting a response before adding the vignette to the package.

Sign up to request clarification or add additional context in comments.

1 Comment

A very poor answer IMHO. Good luck trying to use a high quality Amelia model with data of this size. BTW using a multinomial model is well described by the comment I already made, it's essentially a GLM with a multinomial distribution as you would find with multinom or mlogit.
1

You don't use predict() with mice. It's not a model you're fitting per se. Your imputed results are already there for the 100,000 rows.

If you want data for all rows then you have to put all rows in mice. I wouldn't recommend it though, unless you set it up on a large cluster with dozens of CPU cores.

4 Comments

Are there any packages that would do something like what I am trying to do?
@SonuMishra Well what you can do is to go ahead and fit a statistical model with something like glm or a random forest then use that to predict the missing values. Or just use mean imputation, which is much faster. You can use the RRF package's function na.roughfix for that.
@hack-R those are very poor suggestions in my opinion. mice uses EM which is vastly superior imputation strategy than GLM. That is, unless you meant an iterative algorithm via GLM that converges to a solution; but that is not much different from EM.
@Alex Well, I'm sorry you feel that way, perhaps it's guided by ignorance and inexperience. First of all you're comparing something which is a practical alternative against something which is not an option do to computational resource requirements. Secondly you're making a strong claim with nothing to back it up.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.