Hourly true average between timestamps [closed]

Ask Question

Asked 3 months ago

Modified 2 months ago

Viewed 138 times

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Guide the asker to update the question so it focuses on a single, specific problem. Narrowing the question will help others answer the question concisely. You may edit the question if you feel you can improve it yourself. If edited, the question will be reviewed and might be reopened.

Closed 2 months ago.

Improve this question

I’m storing IoT readings in a GridDB container and need one row per hour with the true average of the points that actually fall inside each hour (not interpolated values):

ts_bucket               avg_temp  min_temp  max_temp  n_rows
------------------------------------------------------------
2025-09-01T00:00:00Z     25.40     25.40     25.40       1
2025-09-01T01:00:00Z     26.10     26.10     26.10       1
2025-09-01T02:00:00Z     27.80     27.80     27.80       1
2025-09-01T03:00:00Z     28.20     28.20     28.20       1
...
2025-09-02T09:00:00Z     27.10     27.10     27.10       1

If an hour has no rows, either omit it or output NULL (or an interpolated value).

Values come at irregular times and are assumed to hold until the next reading (time-weighted needed). For one hour:

-- within 2025-09-01 01:00–02:00
(ts, temperature)
2025-09-01T01:00:00Z  20
2025-09-01T01:10:00Z  35
2025-09-01T01:50:00Z  20

Expected time-weighted average for 01:00–02:00:

((10 min * 20) + (40 min * 35) + (10 min * 20)) / 60
= (200 + 1400 + 200) / 60
= 30.0

This is exactly what TIME_AVG(...) returns for a whole range; I need the same computation but per hour (1 row per hour across a larger range).

Schema TimeSeries:

ts (TIMESTAMP, row key)
deviceid (STRING)
temperature (DOUBLE)
humidity (DOUBLE)
status (STRING)

Data:

INSERT INTO TSDB (ts, deviceid, temperature, humidity, status) VALUES
  (TIMESTAMP('2025-09-01T00:00:00Z'), 'dev-001', 25.4, 35.0, 'OK'),
  (TIMESTAMP('2025-09-01T01:00:00Z'), 'dev-001', 26.1, 42.0, 'OK'),
  (TIMESTAMP('2025-09-01T02:00:00Z'), 'dev-001', 27.8, 48.0, 'WARN'),
  (TIMESTAMP('2025-09-01T03:00:00Z'), 'dev-001', 28.2, 38.0, 'OK'),
  (TIMESTAMP('2025-09-02T00:00:00Z'), 'dev-002', 23.5, 33.0, 'OK'),
  (TIMESTAMP('2025-09-02T01:00:00Z'), 'dev-002', 24.0, 46.0, 'WARN'),
  (TIMESTAMP('2025-09-02T02:00:00Z'), 'dev-002', 22.8, 41.0, 'OK'),
  (TIMESTAMP('2025-09-02T03:00:00Z'), 'dev-002', 21.9, 37.0, 'OK');

What I tried in TQL:

-- Single weighted average for the whole range (works, but not per hour)
SELECT TIME_AVG(temperature)
WHERE ts >= TIMESTAMP('2025-09-01T00:00:00Z')
  AND ts <  TIMESTAMP('2025-09-03T00:00:00Z');

-- Hourly sampling returns interpolated values, not true bucket aggregates
SELECT TIME_SAMPLING(
  temperature,
  TIMESTAMP('2025-09-01T00:00:00Z'),
  TIMESTAMP('2025-09-03T00:00:00Z'),
  1, HOUR
);

TQL doesn’t support GROUP BY so I couldn’t express per-hour roll-ups as a single statement. But this produces the true per-hour aggregates in one SQL query:

SELECT
  ts,                              -- bucket start time
  AVG(temperature) AS avg_temp,
  MIN(temperature) AS min_temp,
  MAX(temperature) AS max_temp,
  COUNT(*)        AS n_rows
FROM TSDB
WHERE ts >= TIMESTAMP('2025-09-01T00:00:00Z')
  AND ts <  TIMESTAMP('2025-09-03T00:00:00Z')
GROUP BY RANGE (ts) EVERY (1, HOUR)
ORDER BY ts;

To include hours with no data (or to interpolate/forward-fill), this variant works:

SELECT ts, AVG(temperature) AS avg_temp
FROM TSDB
WHERE ts BETWEEN TIMESTAMP('2025-09-01T00:00:00Z') AND TIMESTAMP('2025-09-03T00:00:00Z')
GROUP BY RANGE (ts) EVERY (1, HOUR) FILL (NULL)   -- or FILL(LINEAR), PREVIOUS
ORDER BY ts;

Is there a TQL-only way to return one row per hour with AVG/MIN/MAX/COUNT over a time range?
If not, is using SQL with GROUP BY RANGE possible, otherwise client-side bucketing when restricted to TQL?
Any performance tips for millions of rows?

edited Oct 15 at 3:39

user4157124

3,00021 gold badges33 silver badges48 bronze badges

asked Sep 19 at 5:47

Badhon Ashfaq

9137 silver badges13 bronze badges

I have not yet worked with time series databases. For a traditional relational database you would use SQL of course, but this would be rather slow, because in order to know what temperature you have at 0:10 you'd have to look back until you find the last sensor update before that time, which may even be some hours before. The typical solution to this is the LAG function (or even a recursive query to iterate through the rows). So, if GridDB offers an out-of-the-box solution for time series, this may perform much better. But as mentioned, I would't know.

Thorsten Kettner
– Thorsten Kettner

2025-09-19 09:58:16 +00:00
Commented Sep 19 at 9:58

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Hourly true average between timestamps [closed]

0

Hot Network Questions