-1

I have a dataset structured as follows which contains thousands of materials and millions of rows:

Material Date Stock Demand Supply Projected Stock Target Stock Overstock
123456 24/06/2024 60 0 0 60 24 Yes
123456 26/06/2024 0 4 0 56 24 Yes
123456 27/06/2024 0 4 0 52 24 Yes
123456 02/07/2024 0 0 10 62 24 Yes
123456 04/07/2024 0 0 10 72 24 Yes
123456 04/07/2024 0 0 1 73 24 Yes
123456 04/07/2024 0 1 0 72 24 Yes
123456 04/07/2024 0 1 0 71 24 Yes
123456 04/07/2024 0 4 0 67 24 Yes
123456 04/07/2024 0 4 0 63 24 Yes
123456 05/07/2024 0 4 0 59 24 Yes
123456 11/07/2024 0 0 11 70 24 Yes
123456 13/07/2024 0 1 0 69 24 Yes
123456 13/07/2024 0 1 0 68 24 Yes
123456 13/07/2024 0 1 0 67 24 Yes
123456 14/07/2024 0 4 0 63 24 Yes
123456 16/07/2024 0 4 0 59 24 Yes
123456 18/07/2024 0 0 11 70 24 Yes
123456 18/07/2024 0 4 0 66 24 Yes
123456 19/07/2024 0 1 0 65 24 Yes
123456 19/07/2024 0 1 0 64 24 Yes
123456 21/07/2024 0 4 0 60 24 Yes
123456 21/07/2024 0 1 0 59 24 Yes
123456 21/07/2024 0 1 0 58 24 Yes
123456 21/07/2024 0 1 0 57 24 Yes
123456 25/07/2024 0 0 9 66 24 Yes
123456 25/07/2024 0 4 0 62 24 Yes
123456 30/07/2024 0 0 9 71 24 Yes
123456 30/07/2024 0 0 9 80 24 Yes
123456 30/07/2024 0 0 9 89 24 Yes
123456 30/07/2024 0 0 9 98 24 Yes
123456 30/07/2024 0 0 9 107 24 Yes
…. …. …. …. …. …. …. ….
123457 24/06/2024 30 0 0 30 24 No
123457 26/06/2024 0 4 0 26 24 No
123457 27/06/2024 0 4 0 22 24 No
123457 02/07/2024 0 0 10 32 24 No
123457 04/07/2024 0 0 10 42 24 No
123457 04/07/2024 0 0 1 43 24 No
123457 04/07/2024 0 1 0 42 24 No
123457 04/07/2024 0 1 0 41 24 No
123457 04/07/2024 0 4 0 37 24 No
123457 04/07/2024 0 4 0 33 24 No
123457 05/07/2024 0 4 0 29 24 No
123457 11/07/2024 0 0 11 40 24 No
123457 13/07/2024 0 1 0 39 24 No
123457 13/07/2024 0 1 0 38 24 No
123457 13/07/2024 0 1 0 37 24 No
123457 14/07/2024 0 4 0 33 24 No
123457 16/07/2024 0 4 0 29 24 No
123457 18/07/2024 0 0 11 40 24 No
123457 18/07/2024 0 4 0 36 24 No
123457 19/07/2024 0 1 0 35 24 No
123457 19/07/2024 0 1 0 34 24 No
123457 21/07/2024 0 4 0 30 24 No
123457 21/07/2024 0 1 0 29 24 No
123457 21/07/2024 0 1 0 28 24 No
123457 21/07/2024 0 1 0 27 24 No
123457 25/07/2024 0 0 9 36 24 No
123457 25/07/2024 0 4 0 32 24 No
123457 30/07/2024 0 0 9 41 24 No
123457 30/07/2024 0 0 9 50 24 ?
123457 30/07/2024 0 0 9 59 24 ?
123457 30/07/2024 0 0 9 68 24 ?
123457 30/07/2024 0 0 9 77 24 ?
…. …. …. …. …. …. …. ….

Because the date column skips and repeats some dates, it is difficult for me to do what I want.

I need to calculate whether the projected stock is 200% or more for every 28 consecutive days. So in the example above, the projected stock is 200% above the target stock (24*200%=48), so there needs to be a column which checks the current row, and looks in the next 28 days (so not 28 rows) and if every day of those 28 days the projected stock is above 48, then the relevant material is flagged. For reference, my query currently uses CTE to convert the raw data into below dataset.

 Projected_stock as (
    select 
      Material,
      Date,
      Stock,
      Target_Stock,
      Demand,
      Supply,
      sum(Stock - Demand + Supply) over (partition by material order by date,  rows between unbounded preceding and current row) as Projected_Stock
    from datasource
    group by all
    order by material,date)

Thank you.

Edit: I am using databricks and what I expect to see is that for this material 123456, next to the column Target_Stock, there is a another column Overstock, which has a flag on all rows, because therea are more than 28 consecutive days where the projected stock is above 200% of the target stock (48)

7
  • Please tag your DBMS, and the output you are expecting Commented Jun 25, 2024 at 11:03
  • Which rows should have the Overstock flag? Can you include an id or row_number column that uniquely identifies each row? Commented Jun 25, 2024 at 11:32
  • 1
    I added the column and another material. The unique ID is the Material, and basically all rows should have Overstock flag if there are 28 consecutive days of overstock in any period which is present in the dataset. Commented Jun 25, 2024 at 11:43
  • Why do you have Overstock=NO for material 123457 and date 30/07/2024 ? Only 1 of the 4 records with that date are below 48. Commented Jun 25, 2024 at 11:57
  • 1
    You are right, should be ? because we don't know whether in the next 28 days it will remain below or above Commented Jun 25, 2024 at 12:09

2 Answers 2

0

I think you probably need something like this. Date arithmetic is Oracle.

WITH refdate  as
(
    SELECT material, date, max(target_stock) target
    FROM Projected_stock c
    GROUP BY material, date
)
select material,date
       ,case when times_over > 0 and times_under = 0 then 'Overstocked'
             when times_over = 0 then 'Not overstocked'
             when times_over > 0 and times_under > 0 then 'Sometimes overstocked'  
from                
  (SELECT r.material,r.date
         ,count(case when p.projected_stock > 2 * target then 1 end) times_over
         ,count(case when p.projected_stock < 2 * target then 1 end) times_under
  FROM refdate r
  JOIN Projected_stock p ON p.material=c.material and p.date between r.date+1 and r.date + 28
  group by r.material,r.date
  )

Unfortunately, with missing dates and possibly more than one entry per date, I don't think you can use an analytic function, so you are forced to joint your projected stock table to itself.

Sign up to request clarification or add additional context in comments.

Comments

-1

Postgres and MySQL versions of the solution. DateAdd function may vary for you.

PostGres fiddle

    WITH  Projected_stock as 
    (
        select 
          Material,
          Date,
          Stock,
          Target_Stock,
          Demand,
          Supply,
          sum(Stock - Demand + Supply) over (partition by material order by date,  rows between unbounded preceding and current row) as Projected_Stock
        from datasource
        group by all
        order by material,date),
    CTE  as
        (
            SELECT material, c.date, 
            CASE 
               WHEN MIN(CASE WHEN projected_stock>48 THEN '1' ELSE '0' END)!=MAX(CASE WHEN projected_stock>48 THEN '1' ELSE '0' END) THEN '?' 
               ELSE MIN(CASE WHEN projected_stock>48 THEN '1' ELSE '0' END) 
            END as Overstock
            FROM Projected_stock c
            WHERE c.date between  c.date AND  c.date + INTERVAL '28 day' --dateadd(DAY, 28, c.date)
            GROUP BY material, date
        ),
        SELECT p.*, c.overstock
        FROM CTE c
        LEFT JOIN Projected_stock p ON p.date=c.date and c.material=p.material

MySQL fiddle

    WITH  Projected_stock as 
    (
        select 
          Material,
          Date,
          Stock,
          Target_Stock,
          Demand,
          Supply,
          sum(Stock - Demand + Supply) over (partition by material order by date,  rows between unbounded preceding and current row) as Projected_Stock
        from datasource
        group by all
        order by material,date), CTE  as
        (
            SELECT material, c.Date, 
            CASE 
               WHEN MIN(CASE WHEN Projected>48 THEN '1' ELSE '0' END)!=MAX(CASE WHEN Projected>48 THEN '1' ELSE '0' END) THEN '?' 
               ELSE MIN(CASE WHEN Projected>48 THEN '1' ELSE '0' END) 
            END as Overstock
            FROM Projected_stock c
            WHERE c.Date between  c.Date AND  DATE_ADD(c.Date, INTERVAL 28 DAY) 
            GROUP BY c.material, c.Date
        ),
        SELECT p.*, c.overstock
        FROM CTE c
        LEFT JOIN Projected_stock p ON p.Date=c.Date and c.material=p.material

5 Comments

"Links to external resources are encouraged" (see: How do I write a good answer?). Currently only some SQL statement is shown, and the reader must find out why this is supposed to work (or not). (BTW: I am not the downvoter (this time 😉))
Queries updated to include Projected_Stock from datasource
Thanks Bart, for me this works and is exactly what I needed. The original version did the job, not sure why you needed to improve it?
Which dateadd did you use?
First I used + INTERVAL '28 day' and it worked. Now I tried DATE_ADD(c.Date, INTERVAL 28 DAY) but this gives an error DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.