0

I have multiple dataframes where I need to apply the same function (unique)

df1 = data.frame(Bird_ID = c(1:6,7,7,6,2,1))
df2 = data.frame(Bird_ID = c(1:10,7,7,6,2,1,10,9,3))

In each of the df I want to apply the following function to show me unique list of individuals:

individuals1 = data.frame(length(unique(df1[,1])))
individuals2 = data.frame(length(unique(df2[,1])))

Here we have 7 and 10 unique IDs. This is easy but the problem is that sometimes I have more than just 2 df. How can I apply the unique function to all dataframes and have 1 output dataframe that gives me the number of unique individuals per df like this:

output = data.frame(Index = c("Unique.ID"), df1 = c(7),df2=c(10))

#index df1 df2
#Unique.ID 7 10

2 Answers 2

1

There are many ways you could achieve this. Here's one approach that uses functions from the dplyr package

library("dplyr")
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df1 = data.frame(Bird_ID = c(1:6,7,7,6,2,1))
df2 = data.frame(Bird_ID = c(1:10,7,7,6,2,1,10,9,3))

# combine the dataframes into a named list, for convenience
df_list <- list(df1 = df1, df2 = df2)

# bind, group, and summarise
bind_rows(df_list, .id = "df_name") %>%
  group_by(df_name) %>%
  summarise(n_unique = length(unique(Bird_ID)))
#> # A tibble: 2 × 2
#>   df_name n_unique
#>   <chr>      <int>
#> 1 df1            7
#> 2 df2           10

Created on 2021-10-26 by the reprex package (v2.0.1)

Sign up to request clarification or add additional context in comments.

1 Comment

As used by @Yuriy below, if you have many dataframes it may be easier to compile them into a list using something like mget(ls(envir = globalenv(), pattern = "df[0-9]+"), envir = globalenv()) (assuming each dataframe follows a consistent naming pattern)
1
df1 = data.frame(Bird_ID = c(1:6,7,7,6,2,1))
df2 = data.frame(Bird_ID = c(1:10,7,7,6,2,1,10,9,3))

l <- mget(x = ls(pattern = "df"))

library(tidyverse)
map_df(l, ~n_distinct(.x[[1]]))
#> # A tibble: 1 x 2
#>     df1   df2
#>   <int> <int>
#> 1     7    10

Created on 2021-10-26 by the reprex package (v2.0.1)

base

sapply(l, function(x) length(unique(x[[1]])))

df1 df2 
  7  10 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.