-1

I need help with optimizing a cursor or changing the code completely. I have the below requirement:

Create column Sequence grouped by ColumnA, ColumnD and GroupA. StartA is used for sorting. Have tried using LAG, Row_Number, etc with no joy since the grouping sequence restart on change of Column D taking into account ColumnA (that can repeat) and GroupA sorted by StartA.

The code below works fine for a small set of records but last time I run it took over 3 hours and did not complete so I have killed the job. The table has over 700,000 records. Looking for any tips on how to improve this. Thank you! Sample result using DENSE_RANK:

enter image description here

DECLARE 
        @ColumnA VARCHAR(10),
        @StartA DATETIME,
        @ColumnD VARCHAR(50),
        @Sequence INTEGER,
        @Sequence_Calc INTEGER = 1,
        @Previous_ColumnA VARCHAR(10),
        @Previous_ColumnD VARCHAR(50)

SELECT *
    INTO #Temp_Table
    FROM TABLEA
    ORDER BY ColumnA,
        ColumnD

DECLARE Seq_Cursor CURSOR 

    FOR SELECT  ColumnA,
             StartA,
             ColumnD,
             Sequence
      FROM #Temp_Table
      ORDER BY ColumnA,
        ColumnD

FOR UPDATE OF Sequence

OPEN Seq_Cursor

    FETCH NEXT FROM Seq_Cursor
        INTO    @ColumnA, @StartA, @ColumnD, @Sequence 

WHILE @@FETCH_STATUS= 0
BEGIN
    BEGIN
        UPDATE #Temp_Table
        SET Sequence = @Sequence_Calc
        WHERE ColumnD = @ColumnD
        AND StartA = @StartA
        AND ColumnA = @ColumnA

        SET @Previous_ColumnA = @ColumnA
        SET @Previous_ColumnD = @ColumnD
    END

    FETCH NEXT FROM Seq_Cursor
    INTO     @ColumnA, @StartA, @ColumnD, @Sequence
    
    BEGIN 
         SELECT @Sequence_Calc = CASE WHEN @Previous_ColumnD = @ColumnD THEN 
                                 CASE WHEN @Previous_ColumnA <> @ColumnA THEN @Sequence_Calc + 1 ELSE @Sequence_Calc END 
                                 ELSE 1 END 
    END
END

CLOSE Seq_Cursor
DEALLOCATE Seq_Cursor
12
  • 3
    Have you considered binning the cursor? SQL is set-based language, it excels at at set-based logic not iterative tasks. Commented Apr 1, 2024 at 11:38
  • 1
    Please add sample data as text. Commented Apr 1, 2024 at 12:20
  • 1
    I'm not keen to reverse engineer non working code so can you explain how groupa is derived since the published data doesn't make sense to me. Commented Apr 1, 2024 at 13:09
  • As per the question guide, please do not post images of code, data, error messages, etc. - copy or type the text into the question. Please reserve the use of images for diagrams or demonstrating rendering bugs, things that are impossible to describe accurately via text. Commented Apr 1, 2024 at 19:03
  • 1
    For now park the fact that your code doesn't produce the expected result and explain how the expected result is derived..I can see that the rows up to BB:28/11/22 appear to be derived on the basis of a change to columna and/or a change to starta but this doesn't hold true for BB:28/11/22 and the following row BB:01.12/22. a similar edge case occurs at BB:5/12/22 Commented Apr 2, 2024 at 7:21

1 Answer 1

2

Not sure why you're messing around with cursors, they are slow and inefficient, complex to write and complex to understand.

It's really hard to tell without a fuller explanation of the desired logic, but it seems it's a Gaps-and-Islands problem.

You need to use

  • LAG to mark the rows that are the start of a new group
  • Then use a windowed conditional COUNT to create a group ID
  • Then use ROW_NUMBER partitioned by that ID.
WITH StartValues AS (
    SELECT *,
      CASE WHEN
          ColumnA = LAG(ColumnA) OVER (PARTITION BY ColumnD ORDER BY StartA)
          AND GroupA = LAG(GroupA) OVER (PARTITION BY ColumnD ORDER BY StartA)
        THEN NULL ELSE 1 END AS IsStart
    FROM TABLEA a
),
Grouped AS (
    SELECT *,
      COUNT(IsStart) OVER (PARTITION BY ColumnD ORDER BY StartA) AS GroupID
    FROM Grouped
)
SELECT *,
  ROW_NUMBER() OVER (PARTITION BY ColumnD, GroupID ORDER BY StartA) AS Sequence
FROM Grouped;
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Charlieface for the suggestion. I have tried DENSE_RANK previoulsy with no success. The example provided groups the sequece by ColumnA which does not render the desired results. So have changed the code to DENSE_RANK () over (partition by ColumnD, ColumnA order by GroupA, StartA, ColumnA). This works for the initial rows but doens't as the values in ColumnA continue to repeat. Adding a text file with results - do you have any other suggestions?
this is absolutely amazing! Exactly what I needed in a much simpler and streamlined way. The process is now down from 8 hrs to 9 seconds! Thanks a million!!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.