I would like to do a check on my dataset to make sure a certain set of values are in every group and output a dataset showing all the values I am checking for and whether they exist in each group. How do I do this? For example, using the iris R dataset, say I want to check whether all of the species contain the petal lengths of 1, 3, and 4. I have tried the dplyr summarize function below, but I would like to know whether each value is there or not instead of summarizing the results to true or false.
# load example data
data(iris)
# preview data
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
# what I want
Species Petal.Length value_is_present
setosa 1 Y
versicolor 1 N
virginica 1 N
setosa 3 N
versicolor 3 Y
virginica 3 Y
setosa 4 N
versicolor 4 N
virginica 4 N
# what I tried:
expected_values <- c(1, 3, 4)
# Check if all expected values exist within each group
result <- iris %>%
group_by(Species) %>%
summarise(
all_values_present = all(expected_values %in% Petal.Length)
) %>%
ungroup()
> print(result)
# A tibble: 3 × 2
Species all_values_present
<fct> <lgl>
1 setosa FALSE
2 versicolor FALSE
3 virginica FALSE
Edit: I made some typos in my want dataset but seems like everyone got the picture. Thanks!
setosaand length 1 repeated 3 times, once with Y and twice with N?setosashould have length 1,3, and 4, not 1 repeated three times.