1

I have a table with a column of numbers and then other columns with other data. I have created column "floor10" that has floor of those numbers as a multiple of 10 (e.g. 8 -> 0, 17 -> 10, etc.). What I'd like to do is create another column that has the lag of "floor10" and shows it for each row. There's no fixed number of rows per distinct value of "floor10" so I'm not sure how I would do this using LAG in SQL on AWS Athena.

For example,

Num floor10 Name
2 0 James
23 20 James
28 20 James
16 10 James
38 30 James
8 0 John
54 50 John
56 50 John
28 20 John
22 20 John
25 20 John

I'd like something like,

Num floor10 Name floor10_prev
2 0 James
16 10 James 0
23 20 James 10
28 20 James 10
38 30 James 20
8 0 John
22 20 John 0
25 20 John 0
28 20 John 0
54 50 John 20
56 50 John 20

The following did not work:

SELECT
Num,
floor10,
Name,
LAG(floor10) OVER (PARTITION BY Name, Num ORDER BY Name, Num) AS floor10_prev
FROM table

Alternatively, I tried making a subquery with distinct "floor10" and "Name" and then use lag on that before joining on the main table. While this got the desired result, this doesn't seem efficient and doing that on a much larger dataset took a long time to complete.

2
  • Your desired result doesn't make sense. Your second row (name=James, num=23), why do you expect the previous floor10 to be 10? It's 0. In any case, I think you should only be partitioning by name, and ordering by num. Commented Sep 17, 2024 at 13:24
  • Sorry, the desired result wasn't ordered because I wanted to illustrate what it would look like in the original order but obviously using lag would reorder it. I've edited the desired result to accomodate for that. Commented Sep 17, 2024 at 14:00

2 Answers 2

1

You can use dynamically calculate shift value for lag() - that is row number in group of rows with (name,num). See example

select *
  ,lag(floor10,rn)over(partition by name order by floor10) prev_floor10
from(
   select *
     ,row_number()over(partition by name,floor10 order by floor10)rn
   from test
)a
Num floor10 Name rn prev_floor10
2 0 James 1 null
16 10 James 1 0
23 20 James 1 10
28 20 James 2 10
38 30 James 1 20
8 0 John 1 null
22 20 John 1 0
25 20 John 2 0
28 20 John 3 0
54 50 John 1 20
56 50 John 2 20

fiddle

Test data

Num floor10 Name
2 0 James
23 20 James
28 20 James
16 10 James
38 30 James
8 0 John
54 50 John
56 50 John
28 20 John
22 20 John
25 20 John
Sign up to request clarification or add additional context in comments.

Comments

0

Using LAG() with dynamic row shifting is one option to do it - the another one is to create a cte to enumarate rows per Name (Name_id) and repeating floor10 values and then self join it ON Name_id - 1. This will make a shift between rows and (using Case expression) handle shifting of repeating floor10 values.

--    S a m p l e    D a t a :
Create Table tbl ( Num Int, floor10 Int, Name Varchar(32) );
Insert Into tbl
  VALUES  (  2,  0, 'James'  ), 
          ( 16, 10, 'James'  ),
          ( 23, 20, 'James'  ), 
          ( 28, 20, 'James'  ), 
          ( 38, 30, 'James'  ),
          (  8,  0, 'John'   ), 
          ( 22, 20, 'John'   ), 
          ( 25, 20, 'John'   ),
          ( 28, 20, 'John'   ),
          ( 54, 50, 'John'   ),
          ( 56, 50, 'John'   );

... create cte like below - it will be used for both solutions ...

WITH
  grid AS
    ( Select   Name,
               Sum(1) Over( Partition By Name 
                            Order  By Name, Num
                            Rows Between Unbounded Preceding And Current Row) as Name_id,
               --
               Num, floor10,
-- Row_Number() below enumerates repeating floor10 values as rn [1,2,3,...]
               Row_Number() Over(Partition By Name, floor10 Order By Name, Num) as rn
      From     tbl
    )

1. Using LAG()

--      M a i n    S Q L :  
Select g1.Num, g1.floor10, g1.Name, 
       LAG(g1.floor10, g1.rn) Over(Partition By g1.Name Order By g1.Name, g1.Num) as prev_floor10
From   grid g1
Order By g1.Name, g1.Name_id

2. Self join with shift

--      M a i n    S Q L : 
Select     g1.Num, g1.floor10,  g1.Name,
-- Max(Case ...) below takes the previous floor10 from the first of repeating values 
           Max( Case When g1.rn = 1 Then g2.floor10 End ) Over(Partition By g2.Name 
                                Order By g2.Name, g2.Num
                                Rows Between Unbounded Preceding And Current Row) as prev_floor10
From       grid g1
Left Join  grid g2 ON(g2.Name = g1.Name And g2.Name_id = g1.Name_id - 1)
Order By   g1.Name, g1.Name_id

... both resulting as ...

/*    R e s u l t : 
Num  floor10  Name    prev_floor10
---  -------  ------  ------------
  2        0  James           null
 16       10  James              0
 23       20  James             10
 28       20  James             10
 38       30  James             20
  8        0  John            null
 22       20  John               0
 25       20  John               0
 28       20  John               0
 54       50  John              20
 56       50  John              20      */ 

See the fiddle here.

1 Comment

Have you checked the execution plan on this and compared it with one for lag?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.