R MICE Imputation

Question

data=data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5),
"time"=c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4),
"v1"=c(16,12,14,12,17,16,12,12,13,12,16,16,10,10,14,17,17,12,10,11),
"v2"=c(1,1,3,2,2,2,3,1,2,1,2,1,3,1,1,2,3,3,1,2),
"v3"=c(4,1,4,4,2,2,2,2,1,3,2,3,1,2,2,1,4,1,1,4),
"v4"=c(NA,27,NA,42,40,48,45,25,29,NA,NA,27,NA,NA,NA,NA,NA,NA,44,39),
"v5"=c(NA,1,NA,NA,1,3,3,2,NA,NA,NA,1,NA,NA,NA,NA,3,2,4,1),
"v6"=c(NA,0,1,NA,1,NA,1,NA,0,NA,1,1,NA,NA,NA,NA,0,0,NA,0),
"v7"=c(0,1,1,NA,0,1,1,0,1,0,NA,0,NA,NA,NA,NA,0,1,NA,1),
"v8"=c(1,NA,0,1,0,0,NA,1,1,NA,0,0,NA,NA,NA,NA,1,0,NA,1))

This is my sample data and with it I am seeking to:

A. For time = 1 use v1-v3 to impute v4-v8 using MICE (v4 is continuous, v5 is categorical, v6-v8 is binary)

B. After imputed values are imputed for time = 1, I want to fill NA values that follow with the previous value. So if the variable for time 1-4 is: NA,NA,0,1 and the imputed value at time 1 is 1, then it could be: 1-1-0-1

I attemped:

dataNEW <- mice(data[,data$time == 1],m=5,maxit=50,meth='pmm',seed=500)

Edward · Accepted Answer · 2020-02-26 05:06:49Z

A. For time = 1 use v1-v3 to impute v4-v8 using MICE (v4 is continuous, v5 is categorical, v6-v8 is binary)

First, variables v5 - v6 have to be converted to factors:

data$v5 <- factor(data$v5)
data$v6 <- factor(data$v6)
data$v7 <- factor(data$v7)
data$v8 <- factor(data$v8)

Create a predictor matrix to tell mice to use only v1-v3 to predict v4-v8:

Pred_Matrix <- 1 - diag(ncol(data))
Pred_Matrix[,c(1:2, 6:10)] <- 0

Impute using only 1 imputation (the default is 5) because all you want are the imputed values; you're not doing anything else such as pooling the results for modelling.

impA <- mice(subset(data, subset = time==1), pred = Pred_Matrix, m = 1)

The imputed data can be extracted using the complete function (from the mice package, not tidyr).

B. After imputed values are imputed for time = 1, I want to fill NA values that follow with the previous value. So if the variable for time 1-4 is: NA,NA,0,1 and the imputed value at time 1 is 1, then it could be: 1-1-0-1

library(dplyr)
library(tidyr)  # Needed for the fill function

mice::complete(impA) %>%
  rbind(subset(data, subset=time!=1)) %>%
  arrange(student, time) %>%
  group_by(student) %>%
  fill(v4:v8)

# A tibble: 20 x 10
# Groups:   student [5]
   student  time    v1    v2    v3    v4 v5    v6    v7    v8   
     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
 1       1     1    16     1     4    40 2     1     0     1    
 2       1     2    12     1     1    27 1     0     1     1    
 3       1     3    14     3     4    27 1     1     1     0    
 4       1     4    12     2     4    42 1     1     1     1    
 5       2     1    17     2     2    40 1     1     0     0    
 6       2     2    16     2     2    48 3     1     1     0    
 7       2     3    12     3     2    45 3     1     1     0   
 8       2     4    12     1     2    25 2     1     0     1    
 9       3     1    13     2     1    29 1     0     1     1    
10       3     2    12     1     3    29 1     0     0     1    
11       3     3    16     2     2    29 1     1     0     0    
12       3     4    16     1     3    27 1     1     0     0    
13       4     1    10     3     1    40 1     0     0     0    
14       4     2    10     1     2    40 1     0     0     0    
15       4     3    14     1     2    40 1     0     0     0    
16       4     4    17     2     1    40 1     0     0     0    
17       5     1    17     3     4    40 3     0     0     1    
18       5     2    12     3     1    40 2     0     1     0    
19       5     3    10     1     1    44 4     0     1     0    
20       5     4    11     2     4    39 1     0     1     1

Data

Note, I had to change the first value of v5 to 2, otherwise the polyreg imputation fails (there are only two categories for time=1).

data=data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5),
                "time"=c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4),
                "v1"=c(16,12,14,12,17,16,12,12,13,12,16,16,10,10,14,17,17,12,10,11),
                "v2"=c(1,1,3,2,2,2,3,1,2,1,2,1,3,1,1,2,3,3,1,2),
                "v3"=c(4,1,4,4,2,2,2,2,1,3,2,3,1,2,2,1,4,1,1,4),
                "v4"=c(NA,27,NA,42,40,48,45,25,29,NA,NA,27,NA,NA,NA,NA,NA,NA,44,39),
                "v5"=c(2,1,NA,NA,1,3,3,2,NA,NA,NA,1,NA,NA,NA,NA,3,2,4,1),
                "v6"=c(NA,0,1,NA,1,NA,1,NA,0,NA,1,1,NA,NA,NA,NA,0,0,NA,0),
                "v7"=c(0,1,1,NA,0,1,1,0,1,0,NA,0,NA,NA,NA,NA,0,1,NA,1),
                "v8"=c(1,NA,0,1,0,0,NA,1,1,NA,0,0,NA,NA,NA,NA,1,0,NA,1))

thank you thank you thank you! Is there a data.table solution instead of tidy?
say if we just want to impute values for time 1 and leave everything else alone I wonder if there is a data.table solution for this!
This answer is misleading, because this is not how multiple imputation with chained equations (MICE) is done! Using only one iteration m=1 is simply not correct. Also mice::complete() without argument, as shown, defaults to 1, which would use only the first imputed dataset. Please read: stackoverflow.com/a/66059183/6574038

Collectives™ on Stack Overflow

R MICE Imputation

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related