1

I currently have the below data frame:

structure(list(cluster = c(2L, 3L, 5L, 5L, 6L, 6L, 7L, 9L, 9L, 
10L, 10L), treatment = c("TreatmentA", "TreatmentA", "TreatmentA", 
"TreatmentB", "TreatmentA", "TreatmentB", "TreatmentA", "TreatmentA", 
"TreatmentB", "TreatmentA", "TreatmentB"), count = c(6, 6, 6, 
6, 6, 6, 6, 2, 6, 1, 2)), row.names = c(NA, 11L), class = "data.frame")

I would like to add missing rows so that 'treatment' column has two rows for the numbers 1-10, each with a 'TreatmentA' and 'TreatmentB' value from the 'treatment' column. Additional rows would then have a value of 0 in the 'count' column.

Please see below the df I am trying to create:

structure(list(cluster = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 
5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L), treatment = c("TreatmentA", 
"TreatmentB", "TreatmentA", "TreatmentB", "TreatmentA", "TreatmentB", 
"TreatmentA", "TreatmentB", "TreatmentA", "TreatmentB", "TreatmentA", 
"TreatmentB", "TreatmentA", "TreatmentB", "TreatmentA", "TreatmentB", 
"TreatmentA", "TreatmentB", "TreatmentA", "TreatmentB"), count = c(0L, 
0L, 6L, 0L, 6L, 0L, 0L, 0L, 6L, 6L, 6L, 6L, 6L, 0L, 0L, 0L, 2L, 
6L, 1L, 2L)), class = "data.frame", row.names = c(NA, -20L))
0

2 Answers 2

2

Use tidyr::complete() to fill missing cluster–treatment pairs:

library(dplyr)
library(tidyr)

df %>%
  complete(cluster = 1:10,
           treatment = c("TreatmentA", "TreatmentB"),
           fill = list(count = 0))

This ensures every cluster 1–10 has both treatments, filling missing count values with 0.

Sign up to request clarification or add additional context in comments.

Comments

1

Without relying on external packages, we can merge with synthetic data formed with expand.grid and replace NA's with 0 in count:

expand.grid(cluster=seq.int(10), treatment=c('TreatmentA', 'TreatmentB')) |>
  merge(X, by=c('cluster', 'treatment'), all.x=TRUE) |>
  transform(count = replace(count, is.na(count), 0))
   cluster  treatment count
1        1 TreatmentA     0
2        1 TreatmentB     0
3        2 TreatmentA     6
4        2 TreatmentB     0
5        3 TreatmentA     6
6        3 TreatmentB     0
7        4 TreatmentA     0
8        4 TreatmentB     0
9        5 TreatmentA     6
10       5 TreatmentB     6
11       6 TreatmentA     6
12       6 TreatmentB     6
13       7 TreatmentA     6
14       7 TreatmentB     0
15       8 TreatmentA     0
16       8 TreatmentB     0
17       9 TreatmentA     2
18       9 TreatmentB     6
19      10 TreatmentA     1
20      10 TreatmentB     2

Input

> dput(X)
structure(list(cluster = c(2L, 3L, 5L, 5L, 6L, 6L, 7L, 9L, 9L, 
10L, 10L), treatment = c("TreatmentA", "TreatmentA", "TreatmentA", 
"TreatmentB", "TreatmentA", "TreatmentB", "TreatmentA", "TreatmentA", 
"TreatmentB", "TreatmentA", "TreatmentB"), count = c(6, 6, 6, 
6, 6, 6, 6, 2, 6, 1, 2)), row.names = c(NA, 11L), class = "data.frame")

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.