1

I have a list of sets a and another list of sets b.

I want to intersect each set in a against every set in b, and if the result is not empty, then report the index positions of the matched sets, and the length of the intersection.

I have the following nested for-loop, but wondered if there was a faster implementation possible.

import random as ra    
r1 = 500
r2 = 100
r3 = 5000
a = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]
b = [set([ra.randrange(r1) for _ in range(ra.randrange(r2))]) for __ in range(ra.randrange(r3))]

_match_detail = []

for s1 in range(len(a)):
    for s2 in range(len(b)):
        x = a[s1] & b[s2]
        lenx = len(x)
        if lenx > 0:
            _match_detail.append((s1,s2,lenx))
            

With about 4000 sets each in a and b, it takes about 20 seconds to run on my machine (an i7-8700 CPU @3.20GHz with 16GB memory). I'll eventually be dealing with hundreds of thousands of sets, so this doesn't scale nicely.

I've seen this previous question but I don't think it's applicable.

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.