0

I have the mentioned below code in R and I am trying to add a new column, which should be the sum of values in the first column grouped by a variable.

I have used the dplyr package and the mutate function, but unfortunately I get the the following warning message, when applying the code:

total_tests$total <- total_tests %>% group_by(school_id) %>% mutate(total=sum(distinct_tests)) 

Warning message: In cbind(x[0:(framecol - 1)], cols) : number of rows of result is not a multiple of vector length (arg 1)

Dput output with first twenty rows:

structure(list(distinct_tests = c(121L, 7L, 32L, 12L, 1L, 1L, 
1L, 1L, 2L, 4L, 3L, 15L, 1L, 5L, 49L, 2L, 2L, 3L, 1L, 38L), test_type = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c("EXAM", "HW", "SELF_SERVICE", "SHORT_TEST"
), class = "factor"), school_id = structure(c(113L, 113L, 113L, 
113L, 113L, 217L, 217L, 217L, 217L, 20L, 20L, 21L, 21L, 21L, 
84L, 84L, 84L, 84L, 94L, 94L), .Label = c("1000", "1002", "1003", 
"1004", "1006", "1007", "1008", "1010", "1011", "1012", "1013", 
"1014", "1015", "1019", "1020", "1021", "1022", "1023", "1024", 
"103", "104", "1042", "1043", "1044", "1045", "1053", "1054", 
"1056", "1057", "1058", "1059", "1060", "1061", "1062", "1063", 
"1064", "1065", "1066", "1068", "1069", "1070", "1071", "1072", 
"1073", "1074", "1075", "1076", "1077", "1078", "1155", "1156", 
"1157", "1158", "1159", "1176", "1217", "1227", "1228", "1234", 
"1235", "1257", "1261", "1262", "1263", "1264", "1265", "1266", 
"1267", "1268", "1273", "1274", "1275", "1276", "1277", "1278", 
"1279", "1281", "1282", "1305", "1306", "1343", "1344", "1414", 
"144", "1560", "1593", "1612", "1614", "1645", "1646", "1650", 
"1653", "1654", "166", "167", "1676", "1677", "1679", "1681", 
"1682", "1683", "1685", "1696", "1711", "1773", "186", "1871", 
"1912", "1914", "2196", "2217", "2280", "23", "2301", "264", 
"2640", "2642", "2667", "2668", "2720", "2721", "2746", "2791", 
"284", "285", "2872", "2888", "304", "3044", "3184", "3195", 
"3220", "3221", "3222", "3224", "3225", "3238", "3307", "3324", 
"3347", "3362", "346", "3489", "3496", "3511", "3516", "3591", 
"366", "368", "369", "3749", "3771", "3849", "386", "387", "388", 
"3886", "389", "390", "3912", "3913", "392", "393", "3936", "3937", 
"394", "395", "396", "397", "399", "400", "4026", "4032", "4049", 
"4062", "4072", "4147", "424", "428", "430", "4310", "432", "433", 
"434", "464", "484", "485", "486", "487", "488", "525", "526", 
"528", "546", "548", "564", "565", "566", "567", "568", "569", 
"584", "585", "586", "589", "590", "591", "593", "594", "595", 
"596", "626", "627", "645", "646", "647", "68", "686", "688", 
"705", "744", "745", "746", "747", "748", "749", "765", "784", 
"785", "786", "788", "789", "805", "807", "808", "809", "810", 
"811", "812", "813", "816", "817", "818", "819", "820", "821", 
"822", "824", "828", "829", "830", "831", "832", "833", "834", 
"835", "836", "837", "838", "840", "841", "843", "844", "845", 
"846", "847", "849", "850", "851", "852", "853", "855", "856", 
"857", "860", "863", "864", "865", "866", "867", "868", "869", 
"870", "871", "872", "875", "877", "878", "879", "881", "882", 
"884", "885", "886", "909", "910", "912", "916", "917", "925", 
"929", "930", "933", "938", "939", "941", "944", "948", "954", 
"955", "957", "962", "963", "967", "968", "969", "973", "974", 
"975", "977", "978", "979", "981", "NULL"), class = "factor")), row.names = c(NA, 
20L), class = "data.frame")
5
  • 1
    You have to assign the instruction to a data frame, not to a column total_tests <- total_tests %>% group_by(school_id) %>% mutate(total=sum(distinct_tests)) Commented Oct 2, 2019 at 8:41
  • That is fine, but if I use this line total_tests <- total_tests %>% group_by(school_id) %>% mutate(total=sum(distinct_tests)) I get a fourth column with the total sum, not with the grouped sum. Commented Oct 2, 2019 at 9:28
  • It is strange, since I get a fourth column with the grouped sum. Commented Oct 2, 2019 at 10:17
  • It is weird, I have cleared the space and again loaded the dplyr library. This time it really worked, but I swear that before the output was a different one. Anyways. Thank you for your help ! Commented Oct 2, 2019 at 10:39
  • Sometimes can happen you have loaded some package like plyr with functions with equal name, as mutate, and if you don't specify the package (by means of dplyr::mutate), R takes into account the wrong function. Commented Oct 2, 2019 at 11:54

2 Answers 2

0

Building off of iago's comment you also need to use the summarise function rather than mutate to ensure the output is properly summarised.

totals <- total_tests %>% 
  group_by(school_id) %>% 
  summarise(total=sum(distinct_tests)) 
Sign up to request clarification or add additional context in comments.

1 Comment

You are right ! But I would like to breakdown the sum by groups (in this case the school_id). That is why, I have thought about using mutateinstead of summarise.
0

As you are already using mutate you do not need to use total_tests$total to create a new column, because the function mutate already do this. So, i think you can try this:

total_tests <- total_tests %>% group_by(school_id) %>% mutate(total=sum(distinct_tests)) 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.