I have this table (test.mytable in the sql script below)
CREATE OR REPLACE test.mytable (item STRING(1), I_groupe STRING(1));
INSERT INTO test.mytable (item, I_groupe)
values
('A', '1'),
('B', '1'),
('B', '2'),
('C', '2'),
('D', '3'),
| item | Intermediate_group |
|---|---|
| A | 1 |
| B | 1 |
| B | 2 |
| C | 2 |
| D | 3 |
My purpose is to group the item together. My expected result is :
| item | Final_group |
|---|---|
| A,B,C | 1 |
| D | 2 |
I would like to group the item A and B because they have at least one Intermediate_group in common (Intermediate_group 1). Then I would like to group A,B with C because there is an Intermediate_group in common (Intermediate_group 2). Item D has no intermediate group in common with other items. It is therefore alone in its final group.
I have this code:
WITH TEMP1 AS (
SELECT *
FROM (
select item as item_1,
array_agg(distinct I_groupe) as I_groupe1
from test.mytable
group by item_1) AS AA
cross join
(select item as item_2,
array_agg(distinct I_groupe) as I_groupe2
from test.mytable
group by item_2
) AS BB
)
,
TEMP2 AS (
SELECT item_1, item_2,
ARRAY(SELECT * FROM TEMP.I_groupe1
INTERSECT DISTINCT
(SELECT * FROM TEMP.I_groupe2)
) AS result
FROM TEMP1
)
,
TEMP3 AS (
SELECT item_1, item_2, test
FROM TEMP2, unnest(result) as test
)
,
TEMP4 AS (
SELECT STRING_AGG(DISTINCT item_2) as item, STRING_AGG(CAST(test AS STRING)) as I_groupe
FROM TEMP3
GROUP BY item_1
)
,
TEMP5 AS (
SELECT item, I_groupe
FROM TEMP4, UNNEST(SPLIT(item)) as item, UNNEST(SPLIT(I_groupe)) as I_groupe
)
I repeat this code/process manually three times for this "toy" example and finish by a select distinct to get only one row by Final_group
SELECT DISTINCT *
FROM TEMP14
But in a real example it's not scalable. I would like to use a recursive function or a loop to automate this code.
Thanks in advance for your help