R loop to apply same function to multiple dataframes

Question

I have multiple dataframes where I need to apply the same function (unique)

df1 = data.frame(Bird_ID = c(1:6,7,7,6,2,1))
df2 = data.frame(Bird_ID = c(1:10,7,7,6,2,1,10,9,3))

In each of the df I want to apply the following function to show me unique list of individuals:

individuals1 = data.frame(length(unique(df1[,1])))
individuals2 = data.frame(length(unique(df2[,1])))

Here we have 7 and 10 unique IDs. This is easy but the problem is that sometimes I have more than just 2 df. How can I apply the unique function to all dataframes and have 1 output dataframe that gives me the number of unique individuals per df like this:

output = data.frame(Index = c("Unique.ID"), df1 = c(7),df2=c(10))

#index df1 df2
#Unique.ID 7 10

pyg · Accepted Answer · 2021-10-26 10:31:31Z

1

There are many ways you could achieve this. Here's one approach that uses functions from the dplyr package

library("dplyr")
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df1 = data.frame(Bird_ID = c(1:6,7,7,6,2,1))
df2 = data.frame(Bird_ID = c(1:10,7,7,6,2,1,10,9,3))

# combine the dataframes into a named list, for convenience
df_list <- list(df1 = df1, df2 = df2)

# bind, group, and summarise
bind_rows(df_list, .id = "df_name") %>%
  group_by(df_name) %>%
  summarise(n_unique = length(unique(Bird_ID)))
#> # A tibble: 2 × 2
#>   df_name n_unique
#>   <chr>      <int>
#> 1 df1            7
#> 2 df2           10

^{Created on 2021-10-26 by the reprex package (v2.0.1)}

answered Oct 26, 2021 at 10:31

pyg

84210 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

pyg Over a year ago

As used by @Yuriy below, if you have many dataframes it may be easier to compile them into a list using something like mget(ls(envir = globalenv(), pattern = "df[0-9]+"), envir = globalenv()) (assuming each dataframe follows a consistent naming pattern)

Yuriy Saraykin · Accepted Answer · 2021-10-26 10:42:30Z

1

df1 = data.frame(Bird_ID = c(1:6,7,7,6,2,1))
df2 = data.frame(Bird_ID = c(1:10,7,7,6,2,1,10,9,3))

l <- mget(x = ls(pattern = "df"))

library(tidyverse)
map_df(l, ~n_distinct(.x[[1]]))
#> # A tibble: 1 x 2
#>     df1   df2
#>   <int> <int>
#> 1     7    10

^{Created on 2021-10-26 by the reprex package (v2.0.1)}

base

sapply(l, function(x) length(unique(x[[1]])))

df1 df2 
  7  10

answered Oct 26, 2021 at 10:42

Yuriy Saraykin

8,9901 gold badge11 silver badges16 bronze badges

Collectives™ on Stack Overflow

R loop to apply same function to multiple dataframes

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related