rolling average across grouped dates

Question

I have a time series of half-hourly observations. I want a running average of the range of each day's measurements. I group by date and get the correct daily ranges, but then everything I can think of to get the running average only works within the daily groups instead of across them -- row-by-row instead of day-by-day -- and since each day only has one range the value I get is always that range again. Here's one example:

library(tidyverse)
library(zoo)
set.seed(3)
dts <- sort(sample(seq(as_datetime("2024-08-15 09:00:00 EDT"), as_datetime("2024-08-22 09:00:00 EDT"), by="hour"), 24))
df <- tibble(dts = dts, temp = sample(10:20, 24, replace=TRUE))
df <- df %>%
  mutate(date = as.Date(dts)) %>%
  group_by(date) %>%
  mutate(tMax = max(temp, na.rm = TRUE), tMin = min(temp, na.rm = TRUE)) %>%
  mutate(range = tMax - tMin) %>%
  mutate(rollRange = rollapply(range, 3, mean, fill=NA))

In the real data there are always many more observations each day than days in the rolling window so there are only NAs for the top of each day, and thereafter identical to range. An added complication is that for other reasons there are a random number of rows per day so I can't just make my window obs/day * days. Do I have to summarize() this out to a separate data frame, only to merge it back in?

With apologies, here's the result of the reprex plus the column desired showing something like what I'm looking for:

   dts                  temp date        tMax  tMin range rollRange desired
   <dttm>              <int> <date>     <int> <int> <int>     <dbl>   <dbl>
 1 2024-08-15 13:00:00    17 2024-08-15    19    17     2        NA   NA   
 2 2024-08-15 20:00:00    19 2024-08-15    19    17     2        NA   NA   
 3 2024-08-16 02:00:00    20 2024-08-16    20    12     8        NA    4   
 4 2024-08-16 04:00:00    16 2024-08-16    20    12     8         8    4   
 5 2024-08-16 06:00:00    12 2024-08-16    20    12     8         8    4   
 6 2024-08-16 20:00:00    14 2024-08-16    20    12     8         8    4   
 7 2024-08-16 21:00:00    16 2024-08-16    20    12     8        NA    4   
 8 2024-08-17 00:00:00    15 2024-08-17    17    15     2        NA    5.33
 9 2024-08-17 08:00:00    17 2024-08-17    17    15     2        NA    5.33
10 2024-08-18 06:00:00    19 2024-08-18    19    13     6        NA    4.33
11 2024-08-18 10:00:00    13 2024-08-18    19    13     6        NA    4.33
12 2024-08-19 16:00:00    10 2024-08-19    15    10     5        NA    6.66
13 2024-08-19 19:00:00    12 2024-08-19    15    10     5         5    6.66
14 2024-08-19 20:00:00    15 2024-08-19    15    10     5        NA    6.66
15 2024-08-20 06:00:00    13 2024-08-20    18    13     5        NA    6.66
16 2024-08-20 08:00:00    18 2024-08-20    18    13     5        NA    6.66
17 2024-08-21 00:00:00    19 2024-08-21    19    10     9        NA    6   
18 2024-08-21 01:00:00    16 2024-08-21    19    10     9         9    6   
19 2024-08-21 04:00:00    19 2024-08-21    19    10     9         9    6   
20 2024-08-21 08:00:00    10 2024-08-21    19    10     9         9    6   
21 2024-08-21 19:00:00    18 2024-08-21    19    10     9         9    6   
22 2024-08-21 21:00:00    10 2024-08-21    19    10     9        NA    6   
23 2024-08-22 05:00:00    18 2024-08-22    18    14     4        NA   NA   
24 2024-08-22 09:00:00    14 2024-08-22    18    14     4        NA   NA

Please provide expected output. df |> mutate(date = as.Date(dts)) |> mutate(tMax = max(temp, na.rm = TRUE), tMin = min(temp, na.rm = TRUE), range = tMax - tMin, .by = date) |> ungroup() |> mutate(rollRange = rollapply(range, 3, mean, fill=NA))? — Friede
– Friede, Commented Aug 22, 2024 at 15:27
@Friede Edited to show what's desired (not to say expected...). — uhClem
– uhClem, Commented Aug 22, 2024 at 18:53
@Friede Your suggestion gets it more cooperative about producing values instead of NA, but it's still only calculating the row-by-row rolling mean, not the day-by-day rolling mean. — uhClem
– uhClem, Commented Aug 22, 2024 at 19:02

G. Grothendieck · Accepted Answer · 2024-08-22 23:53:45Z

0

Yes, using rollmean on the summary and joining back seems easiest.

roll <- df %>%   # df has already been grouped
  summarize(rollRange2 = first(range)) %>%
  mutate(rollRange2 = rollmean(rollRange2, 3, fill = NA))

df %>%
  left_join(roll, join_by(date))

answered Aug 22, 2024 at 23:53

G. Grothendieck

273k18 gold badges221 silver badges365 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

uhClem Over a year ago

Ah at last! Thanks so much. I had assumed this would be something that comes up enough for there to be a more fluid method that I was just missing somehow -- as well as other questions. But not! And I was not aware of first() -- could've spent another few days trying to discover that.

Collectives™ on Stack Overflow

rolling average across grouped dates

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related