1

The goal is to find an equivalent of DISTINCT inside window COUNT for a sliding/cumulative window. According to COUNT:

When this function is called as a window function with an OVER clause that contains an ORDER BY clause:

  • Using the keyword DISTINCT inside the window function is prohibited and results in a compile-time error.

Sample:

SELECT *, COUNT(DISTINCT val) OVER(ORDER BY id)
FROM VALUES (0, NULL),(1,10),(2,20),(3,20),(4,30),(5, 20),(6, 10) AS s(id, val)
ORDER BY id;

Error: distinct cannot be used with a window frame or an order.


Expected output:

SELECT *,
  COUNT(DISTINCT val) OVER(ORDER BY id) AS CNT_CUMULATIVE,
  COUNT(DISTINCT val) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) 
     AS CNT_SLIDING
FROM VALUES (0, NULL),(1,10),(2,20),(3,20),(4,30),(5, 20),(6, 10) AS s(id, val)
ORDER BY id;

+----+-----+----------------+-------------+
| ID | VAL | CNT_CUMULATIVE | CNT_SLIDING |
+----+-----+----------------+-------------+
|  0 |     |              0 |           0 |
|  1 |  10 |              1 |           1 |
|  2 |  20 |              2 |           2 |
|  3 |  20 |              2 |           2 |
|  4 |  30 |              3 |           3 |
|  5 |  20 |              3 |           2 |
|  6 |  10 |              3 |           3 |
+----+-----+----------------+-------------+
1
  • hmm sigh nether bitand_agg or bitmap_construct_agg support sliding windows. Commented Jan 21 at 2:51

1 Answer 1

2

The idea is to utilize ARRAY_AGG to build an array and ARRAY_SIZE(ARRAY_DISTINCT...) to compute its size:

SELECT *,
  ARRAY_AGG(val) OVER(ORDER BY id) AS AGG_HELPER1,
  ARRAY_SIZE(ARRAY_DISTINCT(ARRAY_AGG(val) OVER(ORDER BY id))) AS CNT_CUMULATIVE,

  ARRAY_AGG(val) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS AGG_HELPER2,
  ARRAY_SIZE(ARRAY_DISTINCT(ARRAY_AGG(val) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW))) 
       AS CNT_SLIDING,
FROM VALUES (0, NULL),(1,10),(2,20),(3,20),(4,30),(5, 20), (6, 10) AS s(id, val)
ORDER BY id;

Output:

enter image description here

Effectively:

COUNT(DISTINCT <col>) OVER(ORDER BY ...) <~> ARRAY_SIZE(ARRAY_DISTINCT(ARRAY_AGG(<col>) OVER(ORDER BY ...)))

Side note: "The maximum amount of data that ARRAY_AGG can return for a single call is 16MB."


Bonus: This pattern can be extended for SUM(DISTINCT... )OVER(ORDER BY ...) too

SELECT *,
  SUM(DISTINCT val) OVER(ORDER BY id) AS SUM_CUMULATIVE,
  SUM(DISTINCT val) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS SUM_SLIDING,
FROM VALUES (0, NULL),(1,10),(2,20),(3,20),(4,30),(5, 20), (6, 10) AS s(id, val)
ORDER BY id;
-- Error: distinct cannot be used with a window frame or an order. 

Becomes:

SELECT *,
  ARRAY_AGG(val) OVER(ORDER BY id) AS AGG_HELPER1,
  REDUCE(ARRAY_DISTINCT(ARRAY_AGG(val) OVER(ORDER BY id)),
           0, (acc, x) -> acc + x) AS SUM_CUMULATIVE,
    
  ARRAY_AGG(val) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS AGG_HELPER2,
  REDUCE(ARRAY_DISTINCT(ARRAY_AGG(val) OVER(ORDER BY id ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)),
           0, (acc, x) -> acc + x) AS SUM_SLIDING
FROM VALUES (0, NULL),(1,10),(2,20),(3,20),(4,30),(5, 20), (6, 10) AS s(id, val)
ORDER BY id;

Output:

enter image description here

Effectively:

SUM(DISTINCT <col>) OVER(ORDER BY ...) <~> REDUCE(ARRAY_DISTINCT(ARRAY_AGG(<col>) OVER(ORDER BY ...)), 0, (acc, val) -> acc + val)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.