1

I am using R to write a report for a class, and I have a pretty big binary database (1 and NA) to indicate presence or absence.

`# A tibble: 149 × 31
    Vide Copé. Ca…¹ Copé.…² Copé.…³ Copé.…⁴ Polyc…⁵ Néréi…⁶ Pecti…⁷ Crang…⁸ Mysid…⁹
   <dbl>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1     0          0       0       0       0       0       0       0       0       0
 2     0          0       0       0       0       0       1       0       0       0
 3     0          0       0       0       0       0       1       0       0       0
 4     0          0       0       0       0       0       0       0       0       1
 5     0          0       0       0       0       0       1       0       0       0
 6     0          0       0       0       0       0       0       0       0       0
 7     0          0       0       0       0       0       1       0       0       0
 8     0          0       0       0       0       0       1       0       0       0
 9     0          0       0       0       0       0       0       0       0       0
10     0          0       0       0       0       0       0       0       0       0
# … with 139 more rows, 21 more variables: `Carides sp.` <dbl>, Amphipodes <dbl>,
#   `Pandalidés(crevette nordique)` <dbl>, Cumacés <dbl>, Isopodes <dbl>,
#   `Crustacés sp.` <dbl>, Éperlan...17 <dbl>, Capucette <dbl>,
#   `Épinoche sp.` <dbl>, `Poisson sp.` <dbl>, Gastéropode <dbl>, Bivalve <dbl>,
#   `Poulamon Atlantique` <dbl>, `Éperlan arc-en-ciel` <dbl>, Éperlan...25 <dbl>,
#   HARENG <dbl>, OSMÉRIDÉ <dbl>, Moronidé <dbl>, `Bar rayé` <dbl>, Baret <dbl>,
#   `Alose savoureuse` <dbl>, and abbreviated variable names ¹​`Copé. Cala.`, …
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names`

I need to represent the frequency of presence for each category :

           Frequency
Vide           0
Copépodes      2
Néréidés sp.   5
etc.

Is there a way for me to do this without recreating a database from scratch? I can't seem to find how online... It's my first time posting a question here, and I'm quite new with R, so I'm not sure how I could fix this.

3 Answers 3

3

Sample data:

set.seed(42)
dat <- as.data.frame(lapply(setNames(nm=letters[1:5]), function(z) sample(0:1, 10, replace=TRUE)))
dat
#    a b c d e
# 1  0 0 0 1 0
# 2  0 1 0 1 0
# 3  0 0 0 1 1
# 4  0 1 0 1 1
# 5  1 0 0 0 1
# 6  1 0 1 1 1
# 7  1 1 0 0 1
# 8  1 1 0 1 1
# 9  0 1 0 1 1
# 10 1 1 0 1 0

Straight-forward code:

stack(sapply(dat, sum))
#   values ind
# 1      5   a
# 2      6   b
# 3      1   c
# 4      8   d
# 5      7   e

Thanks @Friede, colSums is clearly better than sapply(dat, sum), not sure why I missed that...

stack(colSums(dat))
Sign up to request clarification or add additional context in comments.

2 Comments

Why not stack(colSums(dat))?
That of course is canonical and preferred, thanks @Friede
2

If we are using the tidyverse, we can summarise, (and pivot_longer if needed):

library(dplyr)
library(tidyr)

dat |> 
    summarise(across(everything(), \(x) sum(x, na.rm = TRUE))) |> 
    pivot_longer(everything(), values_to = "Frequency")

with @r2evans' data:

# A tibble: 5 × 2
  name  Frequency
  <chr>     <int>
1 a             5
2 b             6
3 c             1
4 d             8
5 e             7

Comments

-1

You could also use skimr.

skimr::skim(yourdata)

Will give you a lot of summary statistics for all your variables, including number missing and complete and the sum with na.rm = TRUE.

You could also use the output data frame if you want to further modify it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.