I have a table of relationships between contracts with two variables: ID1 (refinanced contract) and ID2 (refinancing contract). I want to create a variable that groups all rows of N to M refinancings. For example, if we have:
ID1 ID2
A Z
A Y
A X
B Z
B Y
C W
D W
E V
E U
F T
I want to create a variable such that:
ID1 ID2 Group
A Z 1
A Y 1
A X 1
B Z 1
B Y 1
C W 2
D W 2
E V 3
E U 3
F T 4
The logic is the following:
- All rows with the same ID1 should have the same value of Group.
- All rows with the same ID2 should have the same value of Group.
- The two above conditions have to be combined. In the example, since A->Z and also B->Z, then all rows with ID1 = A or ID1 = B should have the same value of Group, since they share at least one ID2. Conversely, all rows with ID2 = Y or ID2 = X should have the same value of Group, since they share at least one ID1 (=A).
A particular case which is difficult to treat is the following:
ID1 ID2
A X
B X
B Y
C Y
C Z
D Z
All these rows should have the same value of Group, because:
- Since A->X and also B->X, all rows with ID1 = A and ID1 = B should have the same value of Group.
- Since B->Y and also C->Y, all all rows with ID1 = B and ID1 = C should have the same value of Group, which should be the same value as the rows with ID1 = A.
- Since C->Z and also D->Z, all all rows with ID1 = C and ID1 = D should have the same value of Group, which should be the same value as the rows with ID1 = A and ID1 = B.
However, I can't see how to do it without iteratively doing joins that substantially increase the table size when programming. Since my table contains millions of rows, it is not feasible to apply this iterative logic. Could you help me with a more optimized method?



PROC OPTGRAPH/PROC OPTNETWORKcan group this for you in very little code. Otherwise, you're going to be stuck doing a lot of iterative work with hash tables or SQL joins.