I have a dataset structured as follows which contains thousands of materials and millions of rows:
| Material | Date | Stock | Demand | Supply | Projected Stock | Target Stock | Overstock |
|---|---|---|---|---|---|---|---|
| 123456 | 24/06/2024 | 60 | 0 | 0 | 60 | 24 | Yes |
| 123456 | 26/06/2024 | 0 | 4 | 0 | 56 | 24 | Yes |
| 123456 | 27/06/2024 | 0 | 4 | 0 | 52 | 24 | Yes |
| 123456 | 02/07/2024 | 0 | 0 | 10 | 62 | 24 | Yes |
| 123456 | 04/07/2024 | 0 | 0 | 10 | 72 | 24 | Yes |
| 123456 | 04/07/2024 | 0 | 0 | 1 | 73 | 24 | Yes |
| 123456 | 04/07/2024 | 0 | 1 | 0 | 72 | 24 | Yes |
| 123456 | 04/07/2024 | 0 | 1 | 0 | 71 | 24 | Yes |
| 123456 | 04/07/2024 | 0 | 4 | 0 | 67 | 24 | Yes |
| 123456 | 04/07/2024 | 0 | 4 | 0 | 63 | 24 | Yes |
| 123456 | 05/07/2024 | 0 | 4 | 0 | 59 | 24 | Yes |
| 123456 | 11/07/2024 | 0 | 0 | 11 | 70 | 24 | Yes |
| 123456 | 13/07/2024 | 0 | 1 | 0 | 69 | 24 | Yes |
| 123456 | 13/07/2024 | 0 | 1 | 0 | 68 | 24 | Yes |
| 123456 | 13/07/2024 | 0 | 1 | 0 | 67 | 24 | Yes |
| 123456 | 14/07/2024 | 0 | 4 | 0 | 63 | 24 | Yes |
| 123456 | 16/07/2024 | 0 | 4 | 0 | 59 | 24 | Yes |
| 123456 | 18/07/2024 | 0 | 0 | 11 | 70 | 24 | Yes |
| 123456 | 18/07/2024 | 0 | 4 | 0 | 66 | 24 | Yes |
| 123456 | 19/07/2024 | 0 | 1 | 0 | 65 | 24 | Yes |
| 123456 | 19/07/2024 | 0 | 1 | 0 | 64 | 24 | Yes |
| 123456 | 21/07/2024 | 0 | 4 | 0 | 60 | 24 | Yes |
| 123456 | 21/07/2024 | 0 | 1 | 0 | 59 | 24 | Yes |
| 123456 | 21/07/2024 | 0 | 1 | 0 | 58 | 24 | Yes |
| 123456 | 21/07/2024 | 0 | 1 | 0 | 57 | 24 | Yes |
| 123456 | 25/07/2024 | 0 | 0 | 9 | 66 | 24 | Yes |
| 123456 | 25/07/2024 | 0 | 4 | 0 | 62 | 24 | Yes |
| 123456 | 30/07/2024 | 0 | 0 | 9 | 71 | 24 | Yes |
| 123456 | 30/07/2024 | 0 | 0 | 9 | 80 | 24 | Yes |
| 123456 | 30/07/2024 | 0 | 0 | 9 | 89 | 24 | Yes |
| 123456 | 30/07/2024 | 0 | 0 | 9 | 98 | 24 | Yes |
| 123456 | 30/07/2024 | 0 | 0 | 9 | 107 | 24 | Yes |
| …. | …. | …. | …. | …. | …. | …. | …. |
| 123457 | 24/06/2024 | 30 | 0 | 0 | 30 | 24 | No |
| 123457 | 26/06/2024 | 0 | 4 | 0 | 26 | 24 | No |
| 123457 | 27/06/2024 | 0 | 4 | 0 | 22 | 24 | No |
| 123457 | 02/07/2024 | 0 | 0 | 10 | 32 | 24 | No |
| 123457 | 04/07/2024 | 0 | 0 | 10 | 42 | 24 | No |
| 123457 | 04/07/2024 | 0 | 0 | 1 | 43 | 24 | No |
| 123457 | 04/07/2024 | 0 | 1 | 0 | 42 | 24 | No |
| 123457 | 04/07/2024 | 0 | 1 | 0 | 41 | 24 | No |
| 123457 | 04/07/2024 | 0 | 4 | 0 | 37 | 24 | No |
| 123457 | 04/07/2024 | 0 | 4 | 0 | 33 | 24 | No |
| 123457 | 05/07/2024 | 0 | 4 | 0 | 29 | 24 | No |
| 123457 | 11/07/2024 | 0 | 0 | 11 | 40 | 24 | No |
| 123457 | 13/07/2024 | 0 | 1 | 0 | 39 | 24 | No |
| 123457 | 13/07/2024 | 0 | 1 | 0 | 38 | 24 | No |
| 123457 | 13/07/2024 | 0 | 1 | 0 | 37 | 24 | No |
| 123457 | 14/07/2024 | 0 | 4 | 0 | 33 | 24 | No |
| 123457 | 16/07/2024 | 0 | 4 | 0 | 29 | 24 | No |
| 123457 | 18/07/2024 | 0 | 0 | 11 | 40 | 24 | No |
| 123457 | 18/07/2024 | 0 | 4 | 0 | 36 | 24 | No |
| 123457 | 19/07/2024 | 0 | 1 | 0 | 35 | 24 | No |
| 123457 | 19/07/2024 | 0 | 1 | 0 | 34 | 24 | No |
| 123457 | 21/07/2024 | 0 | 4 | 0 | 30 | 24 | No |
| 123457 | 21/07/2024 | 0 | 1 | 0 | 29 | 24 | No |
| 123457 | 21/07/2024 | 0 | 1 | 0 | 28 | 24 | No |
| 123457 | 21/07/2024 | 0 | 1 | 0 | 27 | 24 | No |
| 123457 | 25/07/2024 | 0 | 0 | 9 | 36 | 24 | No |
| 123457 | 25/07/2024 | 0 | 4 | 0 | 32 | 24 | No |
| 123457 | 30/07/2024 | 0 | 0 | 9 | 41 | 24 | No |
| 123457 | 30/07/2024 | 0 | 0 | 9 | 50 | 24 | ? |
| 123457 | 30/07/2024 | 0 | 0 | 9 | 59 | 24 | ? |
| 123457 | 30/07/2024 | 0 | 0 | 9 | 68 | 24 | ? |
| 123457 | 30/07/2024 | 0 | 0 | 9 | 77 | 24 | ? |
| …. | …. | …. | …. | …. | …. | …. | …. |
Because the date column skips and repeats some dates, it is difficult for me to do what I want.
I need to calculate whether the projected stock is 200% or more for every 28 consecutive days. So in the example above, the projected stock is 200% above the target stock (24*200%=48), so there needs to be a column which checks the current row, and looks in the next 28 days (so not 28 rows) and if every day of those 28 days the projected stock is above 48, then the relevant material is flagged. For reference, my query currently uses CTE to convert the raw data into below dataset.
Projected_stock as (
select
Material,
Date,
Stock,
Target_Stock,
Demand,
Supply,
sum(Stock - Demand + Supply) over (partition by material order by date, rows between unbounded preceding and current row) as Projected_Stock
from datasource
group by all
order by material,date)
Thank you.
Edit: I am using databricks and what I expect to see is that for this material 123456, next to the column Target_Stock, there is a another column Overstock, which has a flag on all rows, because therea are more than 28 consecutive days where the projected stock is above 200% of the target stock (48)