SQL window functions

Question

I've been playing around with dense_rank, LAG, LEAD, ... all kinds of sorting but don't seem able to get this one solved.

Here's a sample of my data and the result I expect/need:

pk_id	pk_id_row_num	result_Tech	source_id	source_descr	ranking
5649385	5649385_1		1	Tech	1
5649385	5649385_2	OK	2	IAC	1
5437376	5437376_1		1	Tech	2
5437376	5437376_2	CANCEL	1	Tech	2
5649387	5649387_1		1	Tech	3
5649387	5649387_2	OK	2	IAC	3
5649387	5649387_3	FWD	1	Tech	4
5649387	5649387_4	OK	2	IAC	4
5649387	5649387_5	FWD	1	Tech	5
5649387	5649387_6	OK	2	IAC	5
5649387	5649387_7	FWD	1	Tech	6
5649387	5649387_8	OK	2	IAC	6
5647621	5647621_1		1	Tech	7
5647621	5647621_2	CANCEL	1	Tech	7
5647621	5647621_3	CANCEL	1	Tech	7
5649364	5649364_1		1	Tech	8
5649364	5649364_2	OK	2	IAC	8
5649364	5649364_3	FWD	1	Tech	9
5649364	5649364_4	OK	2	IAC	9
5649396	5649396_1		1	Tech	10
5649396	5649396_2	FWD	2	IAC	10
5649396	5649396_3	OK	2	IAC	10
5652537	5652537_1		1	Tech	11
5652537	5652537_2	FWD	2	IAC	11
5652537	5652537_3	OK	2	IAC	11
5652537	5652537_4	FWD	1	Tech	12
5652537	5652537_5	OK	2	IAC	12
5652537	5652537_6	CANCEL	1	Tech	12

This is about a tech raising a ticket for an operator. I need to group the correct actions with the correct request but there are several possibilities.

The most common case is a request from a tech gets resolved by an operator.

But a tech can open the ticket again (FWD in result_tech) after an operator has closed it. In this case this is considered a new sequence within the existing ticket.

Also an operator can perform a 'FWD' to another operator. But than this remains in the same sequence.

It comes down to this that, within the scope (tried partition by etc ) of 1 PK_ID, a combination of (result_tech is null and source_id = 1) or (result_tech = 'FWD' and source_id = 1) defines the start of the sequence and all following (ordered by pk_id_row_num) records define actions on this sequence, being

a solution (result_tech = 'OK') or a forward (result_tech = 'FWD') from an operator (source_id = 2)
a cancel from the tech (result_tech = 'CANCEL' and source_id = 1)

Mostly you'll have only 2 records in 1 pk_id/sequence but sometimes we have 3 and theoretically it could an infinite number of records if the different operators keep on forwarding the request.

The column ranking is my desired result. This allows me to group the different actions within the scope of 1 pk_id to the correct sequence of events.

As said I tried several approaches. The last one gets me some closer but not quite yet. There is still some fine tuning needed here. Hope somebody can help me out.

There is no use of dense_rank() here but I have been playing with that too.

SELECT 
    pk_id
    ,pk_id_source_id
    ,reason_id
    ,reason_desc
    ,result_tech
    ,source_id
    ,source_descr
    ,CASE
        WHEN (result_tech IS NULL OR result_tech = 'FWD') AND source_id = 1 THEN 'START'
        WHEN Lead(source_id,1,1) Over(PARTITION BY pk_id ORDER BY pk_id_source_id) <=  source_id THEN 'NEXT'
    END sorting
FROM 
    My_example_table
WHERE 
    pk_id IN (5437376, 5647621, 5649364, 5649385, 5649387, 5649396, 5652537)
ORDER BY 
    pk_id_source_id;

I tried several combinations of windowing functions, mostly dense_rank, LAG and LEAD.

Played with several options to sort and or RESET, partitioning by, ...

The expected result is the last column. I just need to be able to identify, within the scope of 1 PK_ID, all records (start and actions) related to 1 sequence of events.

Please clearly state the exact rules for when the ranking should increase by one, going down your sample data. I don't see any obvious rule. — Tim Biegeleisen
– Tim Biegeleisen, Commented Feb 26 at 11:32
That's indeed a bit of a problem. I'll try to identify some cases. First we need to group/partition per pk_id. Within 1 pk_id you'll always have an first record indicating the start of the request. This is always indicated by result_tech is null and source_id = 1. Next there are a serries of actions/records which are linked to this first record. A new sequence is introduced in the same PK_ID if the tech does a 'FWD' — L.P.
– L.P., Commented Feb 26 at 13:02
Take a closer look to pk_id 5649387, here there are 4 sequences. Tech has opened, operator has closed, tech reopened, operator closed again. etc.. With pk_id 5652537, the tech has opened the request, the operator has FWD the request and another operator has closed. This is also 1 sequence. The second sequence for this starts as soon as the tech does a FWD (which indicates a reopen of the request) A clear rule to indicate the start of a sequence whithin the scope of a PK_ID might be (result_tech IS NULL OR result_tech = 'FWD') AND source_id = 1 — L.P.
– L.P., Commented Feb 26 at 13:02
Edit your question, delete the blurb and post clear business rules. We can write code for you, but we need to understand the requirement. — Tim Biegeleisen
– Tim Biegeleisen, Commented Feb 26 at 13:15

Chris Maurer · Accepted Answer · 2025-02-26 16:13:26Z

0

I usually solve these with a running sum/count like

Sum(Case When coalesce(result_tech,'FWD')='FWD' and source_id=1 Then 1 End)
  Over (Partition By pk_id Order By pk_id_row_num Rows Between Unbounded Preceding And Current Row) as grouping

You can then write an outer SQL to use it in a Group By or whatever...

Note that most database systems assume that a count or sum in a window function by default is a running sum/count. If your DB is one of those, you can omit the windowing clause (Rows Between Unbounded Preceding And Current Row)

edited Feb 26 at 16:13

answered Feb 26 at 16:06

Chris Maurer

2,9321 gold badge13 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

L.P. Feb 27 at 10:15

Really helped a lot and, as with many things, once you see it, you can't reverse anymore Opens a new world of possibilities, the trick with the limitation on the rows was quite new for me.

Thorsten Kettner · Accepted Answer · 2025-02-26 17:01:22Z

0

So far it seems you want to increase the ranking number everytime a Tech has a result_tech that is either null or FWD. Use a running conditional count for this.

In standard SQL:

select 
  t.*,
  count(*)
   filter (where source_descr = 'Tech' and coalesce(result_tech, 'FWD') = 'FWD') 
   over (order by pk_id_row_num) as ranking
from mytable t
order by pk_id_row_num;

If your DBMS doesn't support the filter clause:

select 
  t.*,
  count(case when source_descr = 'Tech' and coalesce(result_tech, 'FWD') = 'FWD' then 1 end) 
   over (order by pk_id_row_num) as ranking
from mytable t
order by pk_id_row_num;

answered Feb 26 at 17:01

Thorsten Kettner

96.6k8 gold badges56 silver badges82 bronze badges

Comments

Guillaume Outters · Accepted Answer · 2025-02-26 17:28:32Z

You can flag each row as being a state transition (→ increase in group id) or not, then sum the transitions:

with tr as
(
    select
        *,
        case when
            coalesce(pk_id <> lag(pk_id) over (order by pk_id, pk_id_row_num), true)
            or (source_id = 1 and (result_tech = 'FWD' or result_tech is null))
        then 1 else 0 end as transitions
    from t
)
select *, sum(transitions) over (order by pk_id, pk_id_row_num) ranking
from tr

Or if you don't mind having holes between sequence ids, rely on a (windowed) rank + the sum of the (non-windowed, thus summable) FWD criteria that increases the group id:

select
  *,
  -- rank() is the same for same pk_ids, thus giving us a macro group ID.
  rank() over (order by pk_id)
  -- Add the count of previous FWD over the whole set. This addition will never flow over the rank() of the next group (of the next pk_id),
  -- because count(FWD) over the partition <= count(rows) over the partition (and next rank() = this rank() + count(rows)).
  + sum(case when source_id = 1 and (result_tech = 'FWD' or result_tech is null) then 1 else 0 end) over (partition by pk_id order by pk_id_row_num)
  as ranking
from t

Finally a dense_rank() will combine the simplicity of solution 2. (1 query) with the dense results of 1. (no holes in ranking, unless a serie starts with a FWD).
Here the sum() should run over the whole table, because we have no more space between dense_rank()s values to intercalate per-pk_id count of FWD.

select
  *,
  dense_rank() over (order by pk_id)
  + sum(case when source_id = 1 and (result_tech = 'FWD' or result_tech is null) then 1 else 0 end) over (order by pk_id, pk_id_row_num)
  as ranking
from t;

You'll find them summarized under a small fiddle (for PostgreSQL, but should be portable).

Collectives™ on Stack Overflow

SQL window functions

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related