1

trying to get the following SQL Presto code to work so that I can perform business day calculations. It works if I write "d -> day_of_week(d) not in (6,7)" but I also need to filter out holidays from a calendar hence the "and contains(array_agg(date), d)" clause which does not work. I have also tried contains(date, d) but this failed as well. Please advise.

WITH dataset(start_date, end_date) AS (
 values     (date '2021-10-01', date '2021-10-05'),
    (date '2021-10-01', date '2021-10-03'),
    (date '2021-10-02', date '2021-10-10'),
    (date '2021-10-02', date '2021-10-08'),
    (date '2021-10-02', date '2021-10-05')
),
holidays (row, date) AS (
    values (1,date '2021-10-08'))

select start_date,
       end_date,
       cardinality(filter(
               sequence(start_date, end_date, interval '1' day),
                d -> day_of_week(d) not in (6,7) and not contains(array_agg(date), d) 
           )) business_days
from dataset, holidays

1 Answer 1

1

You can flatten the data, filter and then aggregate it back:

-- sample data
WITH dataset(start_date, end_date) AS (
 values     (date '2021-10-01', date '2021-10-05'),
    (date '2021-10-01', date '2021-10-03'),
    (date '2021-10-02', date '2021-10-10'),
    (date '2021-10-02', date '2021-10-08'),
    (date '2021-10-02', date '2021-10-05')
),
holidays (row, date) AS (
    values (1,date '2021-10-08')),
-- query
with_ids as (
    select start_date,
           end_date,
           row_number() over () rn
    from dataset
)

select arbitrary(start_date) start_date, 
    arbitrary(end_date) end_date, 
    array_agg(day) business_days
from with_ids, 
    unnest(sequence(start_date, end_date, interval '1' day)) as t(day)
where day not in (select date from holidays)
group by rn;

Output:

start_date end_date business_days
2021-10-01 2021-10-05 [2021-10-01, 2021-10-02, 2021-10-03, 2021-10-04, 2021-10-05]
2021-10-02 2021-10-10 [2021-10-02, 2021-10-03, 2021-10-04, 2021-10-05, 2021-10-06, 2021-10-07, 2021-10-09, 2021-10-10]
2021-10-02 2021-10-08 [2021-10-02, 2021-10-03, 2021-10-04, 2021-10-05, 2021-10-06, 2021-10-07]
2021-10-02 2021-10-05 [2021-10-02, 2021-10-03, 2021-10-04, 2021-10-05]
2021-10-01 2021-10-03 [2021-10-01, 2021-10-02, 2021-10-03]

The only problem here is that this will result in deletion of ranges without business days (you can bring them back by left joining the original data).

Sign up to request clarification or add additional context in comments.

7 Comments

I am trying to figure out the logic of how this works so let me know if I am understanding this. 1. Make sub query with row numbers to iterate over. 2.Randomly select from the start date column. 3. Randomly select from the end date column—I am really confused on this point.
@WesleyYoung 2. and 3. there is no actual randomness since all rows in the group (after the group by) will have the same start and end (since they have the same rn). Try executing the query in steps. Sub-query, then without group by.
I can seee that it works, I am
@WesleyYoung if answer works for you please feel free to upvote and mark it as accepted one. If something still is not clear - please ask.
sorry I was typing answer on my phone then it died. I don’t have access to test the solution until Monday as this is for work. What I was trying to say was that I can see from your example that it may work, I am just confused as to the why as the documentation simply says I should expect an arbitrary result from using arbitrary. trino.io/docs/current/functions/aggregate.html
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.