-1

I have a panel dateset on the distict level from Germany with three different years (2013, 2017, 2021). I want to lag one of my variables but the lag function form R only provides the same variable again from the same period which is not lagged.

I am confused. Why does the lag function not work as it should and how can I fix it?

This is the code that I used and the output that it generated:

library(tidyverse)

> data %>%
+   select(Kennziffer, Jahr, Kreis, per_foreign_noeduc) %>%
+   arrange(Kennziffer, Jahr) %>%
+   group_by(Kennziffer) %>%
+   mutate(lagged_per_foreign_noeduc = lag(per_foreign_noeduc, n = 1, default = NA))
# A tibble: 1,200 × 5
# Groups:   Kennziffer [400]
   Kennziffer  Jahr Kreis                                    per_foreign_noeduc lagged_per_foreign_noeduc
        <dbl> <dbl> <chr>                                                 <dbl>                     <dbl>
 1          2  2013 Hamburg                                               1.78                      1.78 
 2          2  2017 Hamburg                                               2.45                      2.45 
 3          2  2021 Hamburg                                               3.19                      3.19 
 4         11  2013 Berlin                                                1.44                      1.44 
 5         11  2017 Berlin                                                2.30                      2.30 
 6         11  2021 Berlin                                                2.88                      2.88 
 7       1001  2013 Flensburg, kreisfreie Stadt                           0.820                     0.820
 8       1001  2017 Flensburg, kreisfreie Stadt                           1.46                      1.46 
 9       1001  2021 Flensburg, kreisfreie Stadt                           2.25                      2.25 
10       1002  2013 Kiel, Landeshauptstadt, kreisfreie Stadt              0.761                     0.761
# ℹ 1,190 more rows
# ℹ Use `print(n = ...)` to see more rows
4
  • 2
    It would be easier to help you if you provide a minimal reproducible example including a snippet of your input data shared using dput(), e.g. run data |> dplyr::select(Kennziffer, Jahr, Kreis, per_foreign_noeduc) |> head(15) |> dput() and add the output to your post. Commented Aug 19, 2024 at 13:26
  • 2
    You might also check using dplyr::mutate and dplyr::lag to ensure that you are using the functions from dplyr. See e.g. stackoverflow.com/questions/28235074/… for a related issue. Commented Aug 19, 2024 at 13:27
  • Thank you @stefan! dplyr::lag works! I was not aware of the ambiguities. Commented Aug 19, 2024 at 13:37
  • I'm curious if this question is reproducible as provided. When I run library(tidyverse), I get a message noting that stats::filter() and stats::lag() are masked. In what circumstances would this not happen? Are there other packages which the OP might have loaded which would mask dplyr's lag() after it was loaded? Commented Aug 19, 2024 at 17:49

1 Answer 1

1

Please provide reproducible data as discussed at the top of the tag page. We have attempted to provide data in the Note at the end and dplyr::lag works as expected.

Note that base R lag works differently - it expects a ts or other time series class whereas dplyr lag works with a column in a data.frame so you might want to use dplyr::lag to be sure you are using the dplyr one although normally that is not needed.

library(dplyr)

data %>%
  group_by(Kennziffer) %>%
  mutate(lagged_per_foreign_noeduc = lag(per_foreign_noeduc, n = 1, default = NA))

giving

# A tibble: 10 × 5
# Groups:   Kennziffer [4]
   Kennziffer  Jahr Kreis              per_foreign_noeduc lagged_per_foreign_n…¹
        <int> <int> <chr>                           <dbl>                  <dbl>
 1          2  2013 Hamburg                         1.78                   NA   
 2          2  2017 Hamburg                         2.45                    1.78
 3          2  2021 Hamburg                         3.19                    2.45
 4         11  2013 Berlin                          1.44                   NA   
 5         11  2017 Berlin                          2.3                     1.44
 6         11  2021 Berlin                          2.88                    2.3 
 7       1001  2013 Flensburg, kreisf…              0.82                   NA   
 8       1001  2017 Flensburg, kreisf…              1.46                    0.82
 9       1001  2021 Flensburg, kreisf…              2.25                    1.46
10       1002  2013 Kiel, Landeshaupt…              0.761                  NA   
# ℹ abbreviated name: ¹​lagged_per_foreign_noeduc

Note

data <- data.frame(
  Kennziffer = rep(c(2L, 11L, 1001L, 1002L), c(3L, 3L, 3L, 1L)),
  Jahr = c(2013L, 2017L, 2021L, 2013L, 2017L, 2021L, 2013L, 2017L, 2021L, 2013L),
  Kreis = rep(c("Hamburg", "Berlin", "Flensburg, kreisfreie Stadt",
      "Kiel, Landeshauptstadt, kreisfreie Stadt"), c(3L, 3L, 3L, 1L)),
  per_foreign_noeduc = c(1.78, 2.45, 3.19, 1.44, 2.3, 2.88, 0.82, 1.46, 2.25, 0.761)
)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, actually dpylr:: lag does the trick. I was not aware of the conflicts between the packages and thought there is only on lag function.
When library(tidyverse) was run it should have shown a message warning that lag was in conflict.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.