I have a panel dateset on the distict level from Germany with three different years (2013, 2017, 2021). I want to lag one of my variables but the lag function form R only provides the same variable again from the same period which is not lagged.
I am confused. Why does the lag function not work as it should and how can I fix it?
This is the code that I used and the output that it generated:
library(tidyverse)
> data %>%
+ select(Kennziffer, Jahr, Kreis, per_foreign_noeduc) %>%
+ arrange(Kennziffer, Jahr) %>%
+ group_by(Kennziffer) %>%
+ mutate(lagged_per_foreign_noeduc = lag(per_foreign_noeduc, n = 1, default = NA))
# A tibble: 1,200 × 5
# Groups: Kennziffer [400]
Kennziffer Jahr Kreis per_foreign_noeduc lagged_per_foreign_noeduc
<dbl> <dbl> <chr> <dbl> <dbl>
1 2 2013 Hamburg 1.78 1.78
2 2 2017 Hamburg 2.45 2.45
3 2 2021 Hamburg 3.19 3.19
4 11 2013 Berlin 1.44 1.44
5 11 2017 Berlin 2.30 2.30
6 11 2021 Berlin 2.88 2.88
7 1001 2013 Flensburg, kreisfreie Stadt 0.820 0.820
8 1001 2017 Flensburg, kreisfreie Stadt 1.46 1.46
9 1001 2021 Flensburg, kreisfreie Stadt 2.25 2.25
10 1002 2013 Kiel, Landeshauptstadt, kreisfreie Stadt 0.761 0.761
# ℹ 1,190 more rows
# ℹ Use `print(n = ...)` to see more rows
dput(), e.g. rundata |> dplyr::select(Kennziffer, Jahr, Kreis, per_foreign_noeduc) |> head(15) |> dput()and add the output to your post.dplyr::mutateanddplyr::lagto ensure that you are using the functions fromdplyr. See e.g. stackoverflow.com/questions/28235074/… for a related issue.dplyr::lagworks! I was not aware of the ambiguities.library(tidyverse), I get a message noting thatstats::filter()andstats::lag()are masked. In what circumstances would this not happen? Are there other packages which the OP might have loaded which would mask dplyr'slag()after it was loaded?