1

I have been trying to impute a data set using the mice package using the following code,

my_imp <- mice(train, m=5, method="pmm", maxit=50)

and I got this error:

iter imp variable
  *1   1  existence.expectancy.indexError in solve.default(xtx + diag(pen)) : 
  system is computationally singular: reciprocal condition number = 3.96306e-17*

Here is a sample from my dataframe (dput). The error probably results from the existence.expectancy.index column.

structure(list(galactic.year = c(990025L, 990025L, 990025L, 990025L, 
990025L), galaxy = c("Large Magellanic Cloud (LMC)", "Camelopardalis B", 
"Virgo I", "UGC 8651 (DDO 181)", "Tucana Dwarf"), existence.expectancy.index = c(0.628656922579983, 
0.818082166933375, 0.659443179243005, 0.555861648365899, 0.991196351622249
)), class = "data.frame", row.names = c(NA, -5L))

Please give me ideas on how to solve the error.

7
  • Hello and welcome to SO, could you share a sample of your data. Without that it will be very hard to find ot where the problem lies. Use can dput() or dput(head()) if the data set is large. Please help us help you. Commented Jun 9, 2020 at 6:41
  • Hi, please read related Q/A: stackoverflow.com/a/58832614/6574038 Possible duplicate. Commented Jun 9, 2020 at 6:42
  • @Afrikan_patriot What is different in your case that the provided error isolation approach there won't work? Commented Jun 9, 2020 at 7:25
  • @Afrikan_patriot Thanks for updating. However, when you use dput better don't change the output when providing it. I tried to fix that in an edit to your question. If you want to dput a subset of your data, use e.g. dput(dtrain[1:30, ]). Anyway, I tried out your code and data and wasn't able to reproduce your error. Also question of my last comment might still be open. Commented Jun 9, 2020 at 7:45
  • 1
    @jay.sf i've got the solution.The problem with using mice for imputation here is the large number of unbalanced factor variables in this dataset. When these are turned into dummy variables there is a high probability that you will have one column a linear combination of another. Since the default imputation methods involve linear regression, this results in a X matrix that cannot be inverted. One solution is to change the default imputation method to one that is not stochastic. Commented Jun 9, 2020 at 10:38

1 Answer 1

1

The problem with using mice for imputation here is the large number of unbalanced factor variables in this dataset. When these are turned into dummy variables there is a high probability that you will have one column a linear combination of another. Since the default imputation methods involve linear regression, this results in a X matrix that cannot be inverted.

One solution is to change the default imputation method to one that is not stochastic.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.