Error while imputing large dataframe using mice

Question

I have been trying to impute a data set using the mice package using the following code,

my_imp <- mice(train, m=5, method="pmm", maxit=50)

and I got this error:

iter imp variable
  *1   1  existence.expectancy.indexError in solve.default(xtx + diag(pen)) : 
  system is computationally singular: reciprocal condition number = 3.96306e-17*

Here is a sample from my dataframe (dput). The error probably results from the existence.expectancy.index column.

structure(list(galactic.year = c(990025L, 990025L, 990025L, 990025L, 
990025L), galaxy = c("Large Magellanic Cloud (LMC)", "Camelopardalis B", 
"Virgo I", "UGC 8651 (DDO 181)", "Tucana Dwarf"), existence.expectancy.index = c(0.628656922579983, 
0.818082166933375, 0.659443179243005, 0.555861648365899, 0.991196351622249
)), class = "data.frame", row.names = c(NA, -5L))

Please give me ideas on how to solve the error.

Hello and welcome to SO, could you share a sample of your data. Without that it will be very hard to find ot where the problem lies. Use can dput() or dput(head()) if the data set is large. Please help us help you. — Jan
– Jan, Commented Jun 9, 2020 at 6:41
Hi, please read related Q/A: stackoverflow.com/a/58832614/6574038 Possible duplicate. — jay.sf
– jay.sf, Commented Jun 9, 2020 at 6:42
@Afrikan_patriot What is different in your case that the provided error isolation approach there won't work? — jay.sf
– jay.sf, Commented Jun 9, 2020 at 7:25
@Afrikan_patriot Thanks for updating. However, when you use dput better don't change the output when providing it. I tried to fix that in an edit to your question. If you want to dput a subset of your data, use e.g. dput(dtrain[1:30, ]). Anyway, I tried out your code and data and wasn't able to reproduce your error. Also question of my last comment might still be open. — jay.sf
– jay.sf, Commented Jun 9, 2020 at 7:45
@jay.sf i've got the solution.The problem with using mice for imputation here is the large number of unbalanced factor variables in this dataset. When these are turned into dummy variables there is a high probability that you will have one column a linear combination of another. Since the default imputation methods involve linear regression, this results in a X matrix that cannot be inverted. One solution is to change the default imputation method to one that is not stochastic. — Afrikan_patriot
– Afrikan_patriot, Commented Jun 9, 2020 at 10:38

Afrikan_patriot · Accepted Answer · 2020-06-09 10:49:33Z

1

The problem with using mice for imputation here is the large number of unbalanced factor variables in this dataset. When these are turned into dummy variables there is a high probability that you will have one column a linear combination of another. Since the default imputation methods involve linear regression, this results in a X matrix that cannot be inverted.

One solution is to change the default imputation method to one that is not stochastic.

answered Jun 9, 2020 at 10:49

Afrikan_patriot

631 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Error while imputing large dataframe using mice

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related