SAS grouping algorithm

Question

I have the following mock up table

#n a b group
1  1 1  1
2  1 2  1
3  2 2  1
4  2 3  1
5  3 4  2
6  3 5  2
7  4 5  2

I am using SAS for this problem. In column group, the rows that are interconnected through a and b are grouped. I will try to explain why these rows are in the same group

row 1 to 2 are in group 2 since they both have a = 1
row 3 is in group 2 since b = 2 in row 2 and 3 and row 2 is in group 1
row 3 and 4 are in group 1 since a = 2 in both rows and row 3 is in group 1

The overall logic is that if a row x contains the same value of a or b as row y, row x also belongs to the same group as y is a part of. Following the same logic, row 5,6 and 7 are in group 2.

Is there any way to make an algorithm to find these groups?

What do you want to happen if there is another observation with a=4 and b=2? Would that mean that there is only one group? or do you only want to process the rows in order A,B so that it will come between rows 6 and 7 and cause there to be four groups? — Tom
– Tom, Commented May 12, 2018 at 15:40
Are a and b always increasing for each successive row? If yes, then Richard's answer will work, but if not then this is a much trickier problem that will involve making multiple passes through your data to identify connected components. — user667489
– user667489, Commented May 13, 2018 at 10:38

Richard · Accepted Answer · 2018-09-12 10:54:02Z

1

Case I:

Grouping defined as to be item linkage within contiguous rows.

Use the LAG function to examine both variables prior values. Increase the group value if both have changed. For example

group + ( a ne lag(a) and b ne lag(b) );

Case II:

Grouping determined from pair item slot value linkages over all data.

From grouping pairs by either key

General statement of problem:
-----------------------------
Given: P = p{i} = (p{i,1},p{i,2}), a set of pairs (key1, key2).

Find: The distinct groups, G = g{x}, of P,
      such that each pair p in a group g has this property:

      key1 matches key1 of any other pair in g.
      -or-
      key2 matches key2 of any other pair in g.

Demonstrates

… an iterative way using hashes. Two hashes maintain the groupId assigned to each key value. Two additional hashes are used to maintain group mapping paths. When the data can be passed without causing a mapping, then the groups have been fully determined. A final pass is done, at which point the groupIds are assigned to each pair and the data is output to a table.

edited Sep 12, 2018 at 10:54

answered May 12, 2018 at 11:22

Richard

27.7k4 gold badges28 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user667489 Over a year ago

This only works if a and b are both monotonically increasing with respect to n. Anything like Tom's example or e.g. n=8 a=5 b=4 would not be handled correctly.

Richard Over a year ago

My answer does not depend on monoticity, it presumes grouping desired is by row-wise contiguity (within a pair item slot) -- similar in vein to BY ... NOTSORTED

Collectives™ on Stack Overflow

SAS grouping algorithm

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related