How to check if all values exist by group in R

Question

I would like to do a check on my dataset to make sure a certain set of values are in every group and output a dataset showing all the values I am checking for and whether they exist in each group. How do I do this? For example, using the iris R dataset, say I want to check whether all of the species contain the petal lengths of 1, 3, and 4. I have tried the dplyr summarize function below, but I would like to know whether each value is there or not instead of summarizing the results to true or false.

# load example data
data(iris)

# preview data
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa


# what I want
     Species Petal.Length value_is_present
     setosa            1                Y
 versicolor            1                N
  virginica            1                N
     setosa            3                N
 versicolor            3                Y
  virginica            3                Y
     setosa            4                N
 versicolor            4                N
  virginica            4                N

# what I tried:
expected_values <- c(1, 3, 4)

# Check if all expected values exist within each group
result <- iris %>%
  group_by(Species) %>%
  summarise(
    all_values_present = all(expected_values %in% Petal.Length)
  ) %>%
  ungroup()

> print(result)
# A tibble: 3 × 2
  Species    all_values_present
  <fct>      <lgl>             
1 setosa     FALSE             
2 versicolor FALSE             
3 virginica  FALSE

Edit: I made some typos in my want dataset but seems like everyone got the picture. Thanks!

I don't understand your desired output. Why is setosa and length 1 repeated 3 times, once with Y and twice with N? — MrFlick
– MrFlick, Commented Oct 23 at 20:34
@MrFlick oops, it was just example output I made up, setosa should have length 1,3, and 4, not 1 repeated three times. — user30397791
– user30397791, Commented Oct 24 at 18:02

Friede · Accepted Answer · 2025-10-24 13:01:48Z

Base R idea:

expand.grid(Species=levels(iris$Species), Petal.Length=c(1, 3, 4)) |>
  sort_by(~Species) |> # cosmetics
  ( \(.) transform(., present = do.call('paste0', .) %in% do.call(
    'paste0', subset(iris, select=c(Species, Petal.Length)))) )()

-output

     Species Petal.Length present
1     setosa            1    TRUE
4     setosa            3   FALSE
7     setosa            4   FALSE
2 versicolor            1   FALSE
5 versicolor            3    TRUE
8 versicolor            4    TRUE
3  virginica            1   FALSE
6  virginica            3   FALSE
9  virginica            4   FALSE

NOTE. We are almost always better off by using a Boolean variable TRUE/FALSE or 1/0 which can be achieved by doing +(...). It simplifies further analysis a lot. A quick demonstration:

# <...> |>
  aggregate(present~Species, data=_, all)

-output

     Species present
1     setosa   FALSE
2 versicolor   FALSE
3  virginica   FALSE

MrFlick · Accepted Answer · 2025-10-25 03:32:53Z

3

I think you can get what you want by counting the number of values per group, and then converting those to Y/N values. How about

iris %>% 
  filter(Petal.Length %in% expected_values) %>% 
  mutate(Petal.Length=factor(Petal.Length)) %>% 
  count(Species, Petal.Length, .drop=FALSE) %>% 
  mutate(value_is_present = if_else(n>0, "Y", "N"), n=NULL)

which returns

     Species Petal.Length value_is_present
1     setosa            1                Y
2     setosa            3                N
3     setosa            4                N
4 versicolor            1                N
5 versicolor            3                Y
6 versicolor            4                Y
7  virginica            1                N
8  virginica            3                N
9  virginica            4                N

edited Oct 25 at 3:32

answered Oct 23 at 20:42

MrFlick

209k19 gold badges300 silver badges324 bronze badges

1 Comment

knitz3 Oct 29 at 18:14

I find this pattern easiest to write on the fly and find dplyr::count() very useful. If you specify factor levels like this mutate(Petal.Length = factor(Petal.Length, levels = expected_values)), I believe this will protect against the case where a value in expected_values does not appear in the dataset at all.

Andre Wildberg · Accepted Answer · 2025-10-23 21:14:04Z

Looping over the expected_values per group using purrr::map, then separating the list with tidyr::unnest

library(dplyr)
library(tidyr)

expected_values <- c(1, 3, 4)

iris %>% 
  reframe(value = purrr::map(expected_values, ~ 
            list(Petal.length = .x, 
                 value_is_present = c("N","Y")[any(.x == Petal.Length) + 1])),
          .by = Species) %>% 
  unnest_wider(value)

output

# A tibble: 9 × 3
  Species    Petal.length value_is_present
  <fct>             <dbl> <chr>        
1 setosa                1 Y            
2 setosa                3 N            
3 setosa                4 N            
4 versicolor            1 N            
5 versicolor            3 Y            
6 versicolor            4 Y            
7 virginica             1 N            
8 virginica             3 N            
9 virginica             4 N

Jon Spring · Accepted Answer · 2025-10-23 22:00:37Z

Another variation. Get the distinct Species/Petal.Length values and mark present, add rows for the missing expected_values, and remove the non-expected_values.

iris |>
  distinct(Species, Petal.Length, present = TRUE) |>
  complete(Species, Petal.Length = expected_values, fill = list(present = FALSE)) |>
  filter(Petal.Length %in% expected_values)

  Species    Petal.Length present
  <fct>             <dbl> <lgl>  
1 setosa                1 TRUE   
2 setosa                3 FALSE  
3 setosa                4 FALSE  
4 versicolor            1 FALSE  
5 versicolor            3 TRUE   
6 versicolor            4 TRUE   
7 virginica             1 FALSE  
8 virginica             3 FALSE  
9 virginica             4 FALSE

Or we could set up a table of the Species and expected values and set present FALSE, then update that table with a version of iris where present is TRUE:

iris |>
  reframe(Petal.Length = expected_values, present = FALSE, .by = Species) |>
  rows_update(iris |> distinct(Species, Petal.Length) |> mutate(present = TRUE), 
              by = c("Species", "Petal.Length"), unmatched = "ignore")

bretauv · Accepted Answer · 2025-10-23 21:33:31Z

Yet another alternative: build a data.frame with your expected values and join your actual values to it.

In the code below, starting from the expected values:

we join to the actual data. If the group-value pair doesn't exist in the data to check, then it will be NA in "value_is_present"
if the group-value pairs existed several times, then we end up with the same row duplicated, so we call distinct() to get rid of them.
replace NA generated in the first step by "N".

library(dplyr, warn.conflicts = FALSE)

data_to_check <- iris |> 
  select(Species, Petal.Length) |> 
  mutate(value_is_present = "Y")

expected_values <- expand.grid(unique(iris$Species), c(1, 3, 4)) |> 
  arrange(Var1, Var2)
names(expected_values) <- c("Species", "Petal.Length")

expected_values |> 
  left_join(data_to_check, join_by(Species, Petal.Length)) |> 
  distinct() |> 
  mutate(value_is_present = if_else(is.na(value_is_present), "N", value_is_present))
#>      Species Petal.Length value_is_present
#> 1     setosa            1                Y
#> 2     setosa            3                N
#> 3     setosa            4                N
#> 4 versicolor            1                N
#> 5 versicolor            3                Y
#> 6 versicolor            4                Y
#> 7  virginica            1                N
#> 8  virginica            3                N
#> 9  virginica            4                N

You can name your arguments as in expand.grid(Species=.., Petal.Length=c(1,3,4)), no need to rename them later.

Collectives™ on Stack Overflow

How to check if all values exist by group in R

5 Answers 5

Comments

1 Comment

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

1 Comment

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related