How to get all intersections of sets in python fast

Question

I would like to compute all (different) intersections of a collection of finite sets of integers (here implemented as a list of lists) in python (to avoid confusion, a formal definition is at the end of the question):

> A = [[0,1,2,3],[0,1,4],[1,2,4],[2,3,4],[0,3,4]]
> all_intersections(A) # desired output
[[], [0], [1], [2], [3], [4], [0, 1], [0, 3], [0, 4], [1, 2], [1, 4], [2, 3], [2, 4], [3, 4], [0, 1, 4], [0, 3, 4], [1, 2, 4], [2, 3, 4], [0, 1, 2, 3]]

I have an algorithm that does it iteratively, but it is rather slow (should I post it?), a test case would be

[[0, 1, 2, 3, 4, 9], [0, 1, 4, 5, 6, 10], [0, 2, 4, 5, 7, 11], [1, 3, 4, 6, 8, 12], [2, 3, 4, 7, 8, 13], [4, 5, 6, 7, 8, 14], [0, 1, 9, 10, 15, 16], [0, 2, 9, 11, 15, 17], [1, 3, 9, 12, 16, 18], [2, 3, 9, 13, 17, 18], [9, 15, 16, 17, 18, 19], [0, 5, 10, 11, 15, 20], [1, 6, 10, 12, 16, 21], [10, 15, 16, 19, 20, 21], [5, 6, 10, 14, 20, 21], [11, 15, 17, 19, 20, 22], [5, 7, 11, 14, 20, 22], [2, 7, 11, 13, 17, 22], [7, 8, 13, 14, 22, 23], [3, 8, 12, 13, 18, 23], [13, 17, 18, 19, 22, 23], [14, 19, 20, 21, 22, 23], [6, 8, 12, 14, 21, 23], [12, 16, 18, 19, 21, 23]]

which takes me about 2.5 secs to compute.

Any ideas how to do it fast?

Formal definition (actually hard without latex mode): let A = {A1,...,An} be a finite set of finite sets Ai of non-negative integers. The output should then be the set { intersection of the sets in B : B subset of A }.

So the formal algorithm would be to take the union of all intersections of all subsets of A. But that's clearly taking forever.

Many thanks!

This looks more like "all subsets of the union of the inputs". — kennytm
– kennytm, Commented Jun 3, 2016 at 19:38
What do you even mean by the intersection of two lists? Lists don't have a well-defined intersection operator, though sets do. For example -- is [0,1] intersect [1,0] empty? [0,1]?, [1,0]? Also -- do you mean all interesctions of pairs of lists or all interesections of tuples of lists (including triples, etc.). — John Coleman
– John Coleman, Commented Jun 3, 2016 at 19:42
No, in the above example, [1,3] is a subset of the union of A but it is not an intersection of elements in A, thus not in the output... — Christian
– Christian, Commented Jun 3, 2016 at 19:42
@John Coleman: I indeed consider the lists to intersect as sets. Gonna clarify that in a second. — Christian
– Christian, Commented Jun 3, 2016 at 19:44

John Coleman · Accepted Answer · 2016-06-04 01:34:49Z

7

Here is a recursive solution. It is almost instantaneous on your test example:

def allIntersections(frozenSets):
    if len(frozenSets) == 0:
        return []
    else:
        head = frozenSets[0]
        tail = frozenSets[1:]
        tailIntersections = allIntersections(tail)
        newIntersections = [head]
        newIntersections.extend(tailIntersections)
        newIntersections.extend(head & s for s in tailIntersections)
        return list(set(newIntersections))

def all_intersections(lists):
    sets = allIntersections([frozenset(s) for s in lists])
    return [list(s) for s in sets]

On Edit Here is a cleaner, nonrecursive implementation of the same ideas.

The problem is easiest if you define the intersection of an empty collection of sets to be the universal set, and an adequate universal set can be obtained by taking the union of all elements. This is a standard move in lattice-theory, and is dual to taking the union of an empty collection of sets to be the empty set. You could always throw away this universal set if you don't want it:

def allIntersections(frozenSets):
    universalSet = frozenset.union(*frozenSets)
    intersections = set([universalSet])
    for s in frozenSets:
        moreIntersections = set(s & t for t in intersections)
        intersections.update(moreIntersections)
    return intersections

def all_intersections(lists):
    sets = allIntersections([frozenset(s) for s in lists])
    return [list(s) for s in sets]

The reason that this is so fast with your test example is that, even though your collection has 24 sets, hence having 2**24 (16.8 million) potential intersections, there are in fact only 242 (or 241 if you don't count the empty intersection) distinct intersections. Thus the number of intersections in each pass through the loop is in the low hundreds at most.

It is possible to pick 24 sets so that all of the 2**24 possible intersections are in fact different, so it is easy to see that the worst-case behavior is exponential. But if, as in your test example, the number of intersections is small, this approach will allow you to rapidly compute them.

A potential optimization might be to sort the sets in increasing size before you loop over them. Processing the smaller sets up front might result in more empty intersections appearing earlier, thus keeping the total number of distinct intersections smaller until towards the end of the loop.

edited Jun 4, 2016 at 1:34

answered Jun 3, 2016 at 20:31

John Coleman

52.1k7 gold badges59 silver badges127 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Christian Over a year ago

Great! "This is a standard move in lattice-theory": I indeed need that set as well in order to make it a lattice (otherwise, the join is not well defined).

Christian Over a year ago

I do have a followup question, stackoverflow.com/questions/37631049, in case you are interested in thinking about it.

John Coleman Over a year ago

@ChristianStump It does sound interesting, though I'm not sure if I will have sufficient time to think seriously about it today. I have some family commitments.

ShadowRanger · Accepted Answer · 2016-06-03 22:38:39Z

2

Iterative solution that takes about 3.5 ms on my machine for your large test input:

from itertools import starmap, product
from operator import and_

def all_intersections(sets):
    # Convert to set of frozensets for uniquification/type correctness
    last = new = sets = set(map(frozenset, sets))
    # Keep going until further intersections add nothing to results
    while new:
        # Compute intersection of old values with newly found values
        new = set(starmap(and_, product(last, new)))
        last = sets.copy()  # Save off prior state
        new -= last         # Determine truly newly added values
        sets |= new         # Accumulate newly added values in complete set
    # No more intersections being generated, convert results to canonical
    # form, list of lists, where each sublist is displayed in order, and
    # the top level list is ordered first by size of sublist, then by contents
    return sorted(map(sorted, sets), key=lambda x: (len(x), x))

Basically, it just keeps doing two way intersections among the old result set and the newly found intersections until a round of intersections doesn't change anything, then it's done.

Note: This is not actually the best solution (recursion is sufficiently better algorithmically to win on the test data, where John Coleman's solution, after adding sorting to the outer wrapper so it matches format, takes about 0.94 ms, vs. 3.5 ms for mine). I'm mostly providing it as an example of solving the problem in other ways.

edited Jun 3, 2016 at 22:38

answered Jun 3, 2016 at 21:22

ShadowRanger

158k12 gold badges222 silver badges317 bronze badges

4 Comments

ShadowRanger Over a year ago

@ChristianStump: Actually, it takes almost no time, because sets is either growing or not, and when it grows, the != comparison short circuits by simply checking length and moving on (length is tested first, and only if lengths match does it do element by element checking). The flaw is largely in repeated work; it's recombining elements that have already been combined on previous loops.

ShadowRanger Over a year ago

@ChristianStump: And my explanation is now somewhat wrong, because I improved it to remove that flaw (it now intersects newly found intersections with old, avoiding reintersecting old with old on second and subsequent loops). But it still has some extra overhead from set construction, set copying, and set differencing that the recursive solution avoids, so it's still slower.

John Coleman Over a year ago

This is nice. I wasn't familiar with starmap. I continue to be surprised by how much you can do with itertools

Christian Over a year ago

I do have a followup question, stackoverflow.com/questions/37631049, in case you are interested in thinking about it.

Collectives™ on Stack Overflow

How to get all intersections of sets in python fast

2 Answers 2

3 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related