0

I am struggling to optimize my query where I need to combine orders data and events data using timestamp so that I attribute certain type of the event to the order in this way that the closest event (table A) will be attributed to order (table B).

Examples: Table A:

  • Timestamp: 2024-01-01 10:00; Event: Red
  • Timestamp: 2024-01-01 12:00; Event: Blue
  • Timestamp: 2024-01-01 14:00; Event: Pink

Table B:

  • Timestamp: 2024-01-01 11:00; Order: Cat; This gets attributed to "Red"
  • Timestamp: 2024-01-01 11:40; Order: Dog; This gets attributed to "Red"
  • Timestamp: 2024-01-02 09:10; Order: Bird; This gets attributed to "Pink"

Currently, I am doing it using cross join but it is not performant at all:

WITH ClosestMatch AS (
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY b.Order ORDER BY UNIX_TIMESTAMP(b.Timestamp) - UNIX_TIMESTAMP(a.Timestamp)) AS Rank
    FROM 
        TableB b
    CROSS JOIN 
        TableA a
    WHERE b.Timestamp <= a.Timestamp

)
SELECT
    Order,
    B_Timestamp,
    A_Timestamp,
    Event
FROM 
    ClosestMatch
WHERE 
    Rank = 1
ORDER BY 
    B_Timestamp;

Do you have any ideas how to improve it?

1
  • Why Cat and Dog are attributed to Red if Red is "before them" while the query clearly states that only ones "after" the order should be accounted? Commented Jan 20 at 20:15

2 Answers 2

0

There is not much you can do I think. There is one dirty hack you can try though. You can "linearize" your two tables into one, order it by the "shared" timestamp and then do some windows functions magic:

-- sample data
WITH table_a(Timestamp, Event) as (
    values (timestamp '2024-01-01 10:00', 'Red'),
    (timestamp '2024-01-01 12:00', 'Blue'),
    (timestamp '2024-01-01 14:00', 'Pink')
),
table_b(Timestamp, "Order") as (
    values (timestamp '2024-01-01 11:00', 'Cat'),
    (timestamp '2024-01-01 11:40', 'Dog'),
    (timestamp '2024-01-02 09:10', 'Bird')
),
        
-- query parts
merged AS (
    select 'event' type,
           null "order",
           table_a.*
    from table_a
    union
    select 'order' type,
            table_b."Order",
           table_b.Timestamp,
           null event
    from table_b
)

select "order",
    timestamp,
    event,
    event_time
from(
    SELECT
        type,
        "order",
        timestamp,
        lead(event) IGNORE NULLS over (order by timestamp) event,
        lead(if(type='event', timestamp)) IGNORE NULLS over (order by timestamp) event_time
    FROM
        merged
)
where type = 'order'

Output (does not match the one from you query, which does not match your desired output as I stated in the comments, so I made a bit of a guess what you need):

order timestamp event event_time
Cat 2024-01-01 11:00:00.000 Blue 2024-01-01 12:00:00.000
Dog 2024-01-01 11:40:00.000 Blue 2024-01-01 12:00:00.000
Bird 2024-01-02 09:10:00.000

You can use lag in the similar way to lead to get the closest event from "other side" (i.e. before) if needed.

Sign up to request clarification or add additional context in comments.

Comments

0

drawing inspiration from Guru Stron's answer, you can add the LEAD window function to your event table--assuming events have unique timestamps. then set the join condition so that the timestamps in your order table fall between the timestamps in your events table.

WITH
  table_a ("timestamp", "event") AS (VALUES
    (TIMESTAMP '2024-01-01 10:00', 'Red'),
    (TIMESTAMP '2024-01-01 12:00', 'Blue'),
    (TIMESTAMP '2024-01-01 14:00', 'Pink')
  ),
  table_b ("timestamp", "order") AS (VALUES
    (TIMESTAMP '2024-01-01 11:00', 'Cat'),
    (TIMESTAMP '2024-01-01 11:40', 'Dog'),
    (TIMESTAMP '2024-01-02 09:10', 'Bird')
  ),
  table_a_with_ranges AS (
    SELECT
      *,
      LEAD("timestamp") OVER(ORDER BY "timestamp") AS next_timestamp
    FROM
      table_a
  )
SELECT
  b."order",
  b."timestamp" AS b_timestamp,
  a."timestamp" AS a_timestamp,
  a.event
FROM
  table_b AS b
  JOIN table_a_with_ranges AS a
    ON CASE 
      WHEN a.next_timestamp IS NOT NULL THEN b."timestamp" BETWEEN a."timestamp" AND a.next_timestamp
      ELSE b."timestamp" >= a."timestamp"
        END

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.