0

I have the following mock up table

#n a b group
1  1 1  1
2  1 2  1
3  2 2  1
4  2 3  1
5  3 4  2
6  3 5  2
7  4 5  2   

I am using SAS for this problem. In column group, the rows that are interconnected through a and b are grouped. I will try to explain why these rows are in the same group

  • row 1 to 2 are in group 2 since they both have a = 1
  • row 3 is in group 2 since b = 2 in row 2 and 3 and row 2 is in group 1
  • row 3 and 4 are in group 1 since a = 2 in both rows and row 3 is in group 1

The overall logic is that if a row x contains the same value of a or b as row y, row x also belongs to the same group as y is a part of. Following the same logic, row 5,6 and 7 are in group 2.

Is there any way to make an algorithm to find these groups?

2
  • What do you want to happen if there is another observation with a=4 and b=2? Would that mean that there is only one group? or do you only want to process the rows in order A,B so that it will come between rows 6 and 7 and cause there to be four groups? Commented May 12, 2018 at 15:40
  • Are a and b always increasing for each successive row? If yes, then Richard's answer will work, but if not then this is a much trickier problem that will involve making multiple passes through your data to identify connected components. Commented May 13, 2018 at 10:38

1 Answer 1

1

Case I:

Grouping defined as to be item linkage within contiguous rows.

Use the LAG function to examine both variables prior values. Increase the group value if both have changed. For example

group + ( a ne lag(a) and b ne lag(b) );

Case II:

Grouping determined from pair item slot value linkages over all data.

From grouping pairs by either key

General statement of problem:
-----------------------------
Given: P = p{i} = (p{i,1},p{i,2}), a set of pairs (key1, key2).

Find: The distinct groups, G = g{x}, of P,
      such that each pair p in a group g has this property:

      key1 matches key1 of any other pair in g.
      -or-
      key2 matches key2 of any other pair in g.

Demonstrates

… an iterative way using hashes. Two hashes maintain the groupId assigned to each key value. Two additional hashes are used to maintain group mapping paths. When the data can be passed without causing a mapping, then the groups have been fully determined. A final pass is done, at which point the groupIds are assigned to each pair and the data is output to a table.

Sign up to request clarification or add additional context in comments.

2 Comments

This only works if a and b are both monotonically increasing with respect to n. Anything like Tom's example or e.g. n=8 a=5 b=4 would not be handled correctly.
My answer does not depend on monoticity, it presumes grouping desired is by row-wise contiguity (within a pair item slot) -- similar in vein to BY ... NOTSORTED

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.