Sum multiple columns based on a factor variable

Question

I have a data set like so

Date    Facility    Meas_1  Meas_2  Meas_3
1/1/2021    C   0   0   0
1/1/2021    Ge  0   1   0
1/1/2021    A   0   0   1
1/1/2021    A   0   0   0
1/1/2021    P   1   0   0
1/1/2021    C   0   0   0
1/1/2021    Ge  0   0   0
1/1/2021    P   0   0   0
1/1/2021    R   1   1   0
1/1/2021    C   0   0   0
1/2/2021    Ga  0   1   0
1/2/2021    C   0   0   0
1/2/2021    C   0   1   0
1/2/2021    A   1   0   0
1/2/2021    E   0   0   0

And need to find the sum of Meas_1, Meas_2, and Meas_3 based on the Value of Facility and the date. Facility is a Factor and the measures are binary, 1 being true 0 being false. I'm trying to get a count of each at each facility.

I've tried aggregate with no luck, thank you!

Are you looking for df %>% group_by(Date, Facility) %>% summarise(across(starts_with("Meas"), sum), .groups = "drop")? — Martin Gal
– Martin Gal, Commented Apr 25, 2022 at 22:40
aggregate(.~Date + Facility, df, mean) is the way to use aggregate — Onyambu
– Onyambu, Commented Apr 25, 2022 at 23:10

Abdur Rohman · Accepted Answer · 2022-04-26 00:29:41Z

Base-R Solution

Very similar to onyambu's comment, but here the targeted column names are mentioned explicitly. To me, this way makes the code easier to understand:

#Your data
dat <- structure(list(Date = c("1/1/2021", "1/1/2021", "1/1/2021", "1/1/2021", 
                               "1/1/2021", "1/1/2021", "1/1/2021", "1/1/2021", "1/1/2021", "1/1/2021", 
                               "1/2/2021", "1/2/2021", "1/2/2021", "1/2/2021", "1/2/2021"), 
                      Facility = c("C", "Ge", "A", "A", "P", "C", "Ge", "P", "R", 
                                   "C", "Ga", "C", "C", "A", "E"), Meas_1 = c(0L, 0L, 0L, 0L, 
                                                                              1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L), Meas_2 = c(0L, 
                                                                                                                                      1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L), 
                      Meas_3 = c(0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
                                 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, -15L
                                 ))


aggregate(cbind(Meas_1, Meas_2, Meas_3) ~ Date + Facility, dat, sum)

      Date Facility Meas_1 Meas_2 Meas_3
1 1/1/2021        A      0      0      1
2 1/2/2021        A      1      0      0
3 1/1/2021        C      0      0      0
4 1/2/2021        C      0      1      0
5 1/2/2021        E      0      0      0
6 1/2/2021       Ga      0      1      0
7 1/1/2021       Ge      0      1      0
8 1/1/2021        P      1      0      0
9 1/1/2021        R      1      1      0

data.table solution

To complement Martin Gal's dplyr solution, here is a data.table solution:

dt.dat <- as.data.table(dat)
dt.dat[,lapply(.SD,sum), .SDcols = c("Meas_1", "Meas_2", "Meas_3"), by =.(Date,Facility)]
       Date Facility Meas_1 Meas_2 Meas_3
1: 1/1/2021        C      0      0      0
2: 1/1/2021       Ge      0      1      0
3: 1/1/2021        A      0      0      1
4: 1/1/2021        P      1      0      0
5: 1/1/2021        R      1      1      0
6: 1/2/2021       Ga      0      1      0
7: 1/2/2021        C      0      1      0
8: 1/2/2021        A      1      0      0
9: 1/2/2021        E      0      0      0

Collectives™ on Stack Overflow

Sum multiple columns based on a factor variable

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related