3

I have this table

MachineCode DoCreation MachineStateCode
DM139 2024-04-03 00:32:17.377 99
DM139 2024-04-03 00:32:49.080 0
DM139 2024-04-03 01:51:14.427 99
DM139 2024-04-03 01:51:45.643 0
DM139 2024-04-03 06:07:38.357 3
DM139 2024-04-03 06:07:39.043 0
DM139 2024-04-03 06:23:37.433 100
DM139 2024-04-03 06:23:43.697 0
DM139 2024-04-03 06:24:46.153 1
DM139 2024-04-03 06:25:02.467 0
DM139 2024-04-03 06:46:50.410 100
DM139 2024-04-03 06:47:29.247 0
DM139 2024-04-03 06:47:46.397 100
DM139 2024-04-03 06:53:47.160 0
DM139 2024-04-03 06:59:41.633 1
DM139 2024-04-03 06:59:54.550 0
DM139 2024-04-03 07:00:03.203 1
DM139 2024-04-03 07:00:06.437 0
DM139 2024-04-03 07:00:14.247 1
DM139 2024-04-03 07:00:17.683 2
DM139 2024-04-03 07:00:18.153 1
DM139 2024-04-03 07:00:18.840 2
DM139 2024-04-03 07:00:26.860 1
DM139 2024-04-03 07:00:46.387 2

each records rapresent a "change of state" of the specific machine DM139 where

  • MachineCode = code of machine (nvarchar)
  • DoCreation = timestamp of when the machine change its state (datetime)
  • MachineStateCode = new state of the machine, where 99 = rebooting, 0 = initializing, 3 = error, 1 = starting, 2 = working and so on (int)

I need to create a new column, let's call it MachineStateCodeAdjusted where its value is equal to 3 if the previous MachineStateCode is 3 and the current MachineStateCode is not 2, otherwise it should have the current MachineStateCode value. (the next table is what I need)

MachineCode DoCreation MachineStateCode MachineStateCodeAdjusted
DM139 2024-04-03 00:32:17.377 99 99
DM139 2024-04-03 00:32:49.080 0 0
DM139 2024-04-03 01:51:14.427 99 99
DM139 2024-04-03 01:51:45.643 0 0
DM139 2024-04-03 06:07:38.357 3 3
DM139 2024-04-03 06:07:39.043 0 3
DM139 2024-04-03 06:23:37.433 100 3
DM139 2024-04-03 06:23:43.697 0 3
DM139 2024-04-03 06:24:46.153 1 3
DM139 2024-04-03 06:25:02.467 0 3
DM139 2024-04-03 06:46:50.410 100 3
DM139 2024-04-03 06:47:29.247 0 3
DM139 2024-04-03 06:47:46.397 100 3
DM139 2024-04-03 06:53:47.160 0 3
DM139 2024-04-03 06:59:41.633 1 3
DM139 2024-04-03 06:59:54.550 0 3
DM139 2024-04-03 07:00:03.203 1 3
DM139 2024-04-03 07:00:06.437 0 3
DM139 2024-04-03 07:00:14.247 1 3
DM139 2024-04-03 07:00:17.683 2 2
DM139 2024-04-03 07:00:18.153 1 1
DM139 2024-04-03 07:00:18.840 2 2
DM139 2024-04-03 07:00:26.860 1 1
DM139 2024-04-03 07:00:46.387 2 2

I used the LAG function to get the previous MachineStateCode (and it works) but to work as I want I need to get the previous MachineStateCodeAdjusted, that is not accessible for the LAG function.

The SQL query I'm executing is this:

   SELECT MachineCode
       , DoCreation
   , MachineStateCode
   , IIF(LAG(MachineStateCode, 1, MachineStateCode) OVER (ORDER BY DoCreation ASC) = 3 
  AND MachineStateCode<>2, 3, MachineStateCode)  AS MachineStateCodeAdjusted       
  FROM mch.MachineStateChanges

and the result I get clearly show me the it not works as I want

MachineCode DoCreation MachineStateCode MachineStateCodeAdjusted
DM139 2024-04-03 00:32:17.377 99 99
DM139 2024-04-03 00:32:49.080 0 0
DM139 2024-04-03 01:51:14.427 99 99
DM139 2024-04-03 01:51:45.643 0 0
DM139 2024-04-03 06:07:38.357 3 3
DM139 2024-04-03 06:07:39.043 0 3
DM139 2024-04-03 06:23:37.433 100 100 WRONG
DM139 2024-04-03 06:23:43.697 0 0 WRONG
DM139 2024-04-03 06:24:46.153 1 1 WRONG

I also tried with a recursive CTE and it works but it is very slow! The query extract in 4 minutes 2 days of work, while I need to extract 1 year!

What is the best (and fast) approach to fix it?

4
  • 1
    The description in your question is likely incorrect. You write where its value is equal to 3 if the previous MachineStateCode is 3 and the current MachineStateCode is not 2, but in your result, very many rows should have the value 3 although its previous value is not 3. So do you rather mean all rows beginning from the first row with 3 should get 3 as long as the value is not 2? By the way, which RDBMS do you use? Commented Apr 22 at 16:08
  • @JonasMetzler yes, I need exactly that behaviour. I'm using SQL Server and once found a solution I should move it to RedShift as well (but let's take it one step at a time!) Commented Apr 22 at 16:14
  • Why is 100 wrong? It's correct according to the rule you explained. Commented Apr 22 at 17:22
  • Don't waste time on a SQL Server solution when you're planning to move to Redshift. Commented Apr 22 at 17:23

3 Answers 3

3

I think your explanation is incomplete. You should have indicated that once you get an "error" (3) then all states do not matter, until you get a "working" (2) again.

In this case, this is a Gaps & Islands problem. You can use the traditional solution:

select z.*,
  case when f = 3 or (
          lag(MachineStateCode) 
            over(partition by MachineCode order by DoCreation) = 3
          and MachineStateCode <> 2
       )
       then 3 
       else MachineStateCode
  end as MachineStateCodeAdjusted
from (
  select y.*,
    first_value(MachineStateCode) over(partition by g order by DoCreation) as f
  from (
    select x.*, sum(i) over(partition by MachineCode order by DoCreation) as g
    from (
      select t.*, case when MachineStateCode in (2, 3) then 1 else 0 end as i
      from MachineStateChanges t
    ) x
  ) y
) z

Result:

 MachineCode  DoCreation                   MachineStateCode  i  g  f   MachineStateCodeAdjusted 
 ------------ ---------------------------- ----------------- -- -- --- ------------------------ 
 DM139        2024-04-03 00:32:17.3770000  99                0  0  99  99                       
 DM139        2024-04-03 00:32:49.0800000  0                 0  0  99  0                        
 DM139        2024-04-03 01:51:14.4270000  99                0  0  99  99                       
 DM139        2024-04-03 01:51:45.6430000  0                 0  0  99  0                        
 DM139        2024-04-03 06:07:38.3570000  3                 1  1  3   3                        
 DM139        2024-04-03 06:07:39.0430000  0                 0  1  3   3                        
 DM139        2024-04-03 06:23:37.4330000  100               0  1  3   3                        
 DM139        2024-04-03 06:23:43.6970000  0                 0  1  3   3                        
 DM139        2024-04-03 06:24:46.1530000  1                 0  1  3   3                        
 DM139        2024-04-03 06:25:02.4670000  0                 0  1  3   3                        
 DM139        2024-04-03 06:46:50.4100000  100               0  1  3   3                        
 DM139        2024-04-03 06:47:29.2470000  0                 0  1  3   3                        
 DM139        2024-04-03 06:47:46.3970000  100               0  1  3   3                        
 DM139        2024-04-03 06:53:47.1600000  0                 0  1  3   3                        
 DM139        2024-04-03 06:59:41.6330000  1                 0  1  3   3                        
 DM139        2024-04-03 06:59:54.5500000  0                 0  1  3   3                        
 DM139        2024-04-03 07:00:03.2030000  1                 0  1  3   3                        
 DM139        2024-04-03 07:00:06.4370000  0                 0  1  3   3                        
 DM139        2024-04-03 07:00:14.2470000  1                 0  1  3   3                        
 DM139        2024-04-03 07:00:17.6830000  2                 1  2  2   2                        
 DM139        2024-04-03 07:00:18.1530000  1                 0  2  2   1                        
 DM139        2024-04-03 07:00:18.8400000  2                 1  3  2   2                        
 DM139        2024-04-03 07:00:26.8600000  1                 0  3  2   1                        
 DM139        2024-04-03 07:00:46.3870000  2                 1  4  2   2                        

See running example at db<>fiddle.

Sign up to request clarification or add additional context in comments.

Comments

2

In SQL Server 2022 onwards (or in Azure) you can use IGNORE NULLS with LAST_VALUE() (as can RedShift).

This means you can easily read the last code 2 or code 3, then use that in a CASE expression.

select
  *,
  case when last_23 = 3 then 3 else MachineStateCode END
from
(
  select
    *,
    LAST_VALUE(
      CASE WHEN MachineStateCode IN (2,3) THEN MachineStateCode END
    )
    IGNORE NULLS 
    OVER (
      PARTITION BY MachineCode
          ORDER BY DoCreation
    )
      AS last_23
  from
    MachineStateChanges
)
  AS look_behind

fiddle

Comments

1

You likely want to apply following rules for the new column MachineStateCodeAdjusted:

  • If MachineStateCode is 2 or 3, copy it

  • Otherwise, find the first row before the current row (sorted by the date) whose MachineStateCode is 2 or 3. If this previous row had code 2 or if no such previous row can be found, then copy the MachineStateCode of the current row, otherwise take 3.

One way to implement this logic is this query:

WITH sub AS (
    SELECT *, MachineStateCode AS MachineStateCodeAdjusted
    FROM yourtable
    WHERE MachineStateCode IN (2,3)
)
SELECT *
FROM sub
UNION ALL
SELECT *,  
  COALESCE((SELECT TOP 1 CASE WHEN MachineStateCodeAdjusted = 3 THEN 3 
    ELSE y.MachineStateCode END FROM sub 
    WHERE sub.DoCreation <= y.DoCreation ORDER BY DoCreation DESC), y.MachineStateCode)
FROM yourtable y
WHERE MachineStateCode NOT IN (2,3)
ORDER BY MachineCode, DoCreation;

The idea behind:

  • In the query "sub" (rename it if you want and know a better name), all rows with MachineStateCode 2 or 3 will be selected, their MachineStateCode is simply copied as MachineStateCodeAdjusted.

  • The main query selects all rows whose MachineStateCode is not 2 or 3. For those rows, it will be checked whether previous rows (having an earlier date) exist with MachineStateCode 2 or 3.

  • If such rows exist, take the first (by date) before the current row and check if this row had MachineStateCode 2 or 3. If it was 2, then copy the MachineStateCode, else take 3 as MachineStateCodeAdjusted.

  • If no such rows exist, just copy the MachineStateCode as MachineStateCodeAdjusted.

  • Sort the entire result by MachineCode and date

Note you might need to extend the query if you would like to build up further cases or to make the logic and result depend on the current MachineCode (first column). Then, you could for example add a condition that the MachineCode in main query and subquery should match if this is intended.

See this db<>fiddle with your sample data.

1 Comment

Your solution is far more compact than mine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.