0

I have a time series of half-hourly observations. I want a running average of the range of each day's measurements. I group by date and get the correct daily ranges, but then everything I can think of to get the running average only works within the daily groups instead of across them -- row-by-row instead of day-by-day -- and since each day only has one range the value I get is always that range again. Here's one example:

library(tidyverse)
library(zoo)
set.seed(3)
dts <- sort(sample(seq(as_datetime("2024-08-15 09:00:00 EDT"), as_datetime("2024-08-22 09:00:00 EDT"), by="hour"), 24))
df <- tibble(dts = dts, temp = sample(10:20, 24, replace=TRUE))
df <- df %>%
  mutate(date = as.Date(dts)) %>%
  group_by(date) %>%
  mutate(tMax = max(temp, na.rm = TRUE), tMin = min(temp, na.rm = TRUE)) %>%
  mutate(range = tMax - tMin) %>%
  mutate(rollRange = rollapply(range, 3, mean, fill=NA))

In the real data there are always many more observations each day than days in the rolling window so there are only NAs for the top of each day, and thereafter identical to range. An added complication is that for other reasons there are a random number of rows per day so I can't just make my window obs/day * days. Do I have to summarize() this out to a separate data frame, only to merge it back in?

With apologies, here's the result of the reprex plus the column desired showing something like what I'm looking for:

   dts                  temp date        tMax  tMin range rollRange desired
   <dttm>              <int> <date>     <int> <int> <int>     <dbl>   <dbl>
 1 2024-08-15 13:00:00    17 2024-08-15    19    17     2        NA   NA   
 2 2024-08-15 20:00:00    19 2024-08-15    19    17     2        NA   NA   
 3 2024-08-16 02:00:00    20 2024-08-16    20    12     8        NA    4   
 4 2024-08-16 04:00:00    16 2024-08-16    20    12     8         8    4   
 5 2024-08-16 06:00:00    12 2024-08-16    20    12     8         8    4   
 6 2024-08-16 20:00:00    14 2024-08-16    20    12     8         8    4   
 7 2024-08-16 21:00:00    16 2024-08-16    20    12     8        NA    4   
 8 2024-08-17 00:00:00    15 2024-08-17    17    15     2        NA    5.33
 9 2024-08-17 08:00:00    17 2024-08-17    17    15     2        NA    5.33
10 2024-08-18 06:00:00    19 2024-08-18    19    13     6        NA    4.33
11 2024-08-18 10:00:00    13 2024-08-18    19    13     6        NA    4.33
12 2024-08-19 16:00:00    10 2024-08-19    15    10     5        NA    6.66
13 2024-08-19 19:00:00    12 2024-08-19    15    10     5         5    6.66
14 2024-08-19 20:00:00    15 2024-08-19    15    10     5        NA    6.66
15 2024-08-20 06:00:00    13 2024-08-20    18    13     5        NA    6.66
16 2024-08-20 08:00:00    18 2024-08-20    18    13     5        NA    6.66
17 2024-08-21 00:00:00    19 2024-08-21    19    10     9        NA    6   
18 2024-08-21 01:00:00    16 2024-08-21    19    10     9         9    6   
19 2024-08-21 04:00:00    19 2024-08-21    19    10     9         9    6   
20 2024-08-21 08:00:00    10 2024-08-21    19    10     9         9    6   
21 2024-08-21 19:00:00    18 2024-08-21    19    10     9         9    6   
22 2024-08-21 21:00:00    10 2024-08-21    19    10     9        NA    6   
23 2024-08-22 05:00:00    18 2024-08-22    18    14     4        NA   NA   
24 2024-08-22 09:00:00    14 2024-08-22    18    14     4        NA   NA  
4
  • Please provide expected output. df |> mutate(date = as.Date(dts)) |> mutate(tMax = max(temp, na.rm = TRUE), tMin = min(temp, na.rm = TRUE), range = tMax - tMin, .by = date) |> ungroup() |> mutate(rollRange = rollapply(range, 3, mean, fill=NA))? Commented Aug 22, 2024 at 15:27
  • @Friede Edited to show what's desired (not to say expected...). Commented Aug 22, 2024 at 18:53
  • @Friede Your suggestion gets it more cooperative about producing values instead of NA, but it's still only calculating the row-by-row rolling mean, not the day-by-day rolling mean. Commented Aug 22, 2024 at 19:02
  • Sorry, cannot help; I do not understand. Commented Aug 22, 2024 at 19:11

1 Answer 1

0

Yes, using rollmean on the summary and joining back seems easiest.

roll <- df %>%   # df has already been grouped
  summarize(rollRange2 = first(range)) %>%
  mutate(rollRange2 = rollmean(rollRange2, 3, fill = NA))

df %>%
  left_join(roll, join_by(date))
Sign up to request clarification or add additional context in comments.

1 Comment

Ah at last! Thanks so much. I had assumed this would be something that comes up enough for there to be a more fluid method that I was just missing somehow -- as well as other questions. But not! And I was not aware of first() -- could've spent another few days trying to discover that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.