How do I speed up this nested for loop in Python?

Question

the function shown below is running quite slow even though I used swifter to call it. Does anyone know how to speed this up? My python knowledge is limited at this point and I would appreciate any help I could get. I tried using map() function but somehow it didnt work for me. I guess the nested for loop makes it rather slow, right?

BR, Hannes

def polyData(uniqueIds):
    for index in range(len(uniqueIds) - 1):
        element = uniqueIds[index]
        polyData1 = df[df['id'] == element]
        poly1 = build_poly(polyData1)
        poly1 = poly1.buffer(0)
        for secondIndex in range(index + 1, len(uniqueIds)):
            otherElement = uniqueIds[secondIndex]
            polyData2 = df[df['id'] == otherElement]
            poly2 = build_poly(polyData2)
            poly2 = poly2.buffer(0)
# Calculate overlap percentage wise
            overlap_pct = poly1.intersection(poly2).area/poly1.area
# Form new DF
            df_ol = pd.DataFrame({'id_1':[element],'id_2':[otherElement],'overlap_pct':[overlap_pct]})
# Write to SQL database
            df_ol.to_sql(name='df_overlap', con=e,if_exists='append',index=False)

I think that more than improve the python loop you may want to do a single query that updates all the objects in the DB, since I'm quite sure that the bottleneck is the DB operation. I do not know the SQL wrapper you used here but I'm quite sure it can be done with any SQL frontend! — gabry
– gabry, Commented Sep 27, 2019 at 12:59
hm you think so? I use sqlalchemy: e = create_engine('sqlite:///') And what is the best way to do this? — Hannes
– Hannes, Commented Sep 27, 2019 at 13:01
Try to google for pandas bulk insert, you'll find a lot of interesting stuff. — gabry
– gabry, Commented Sep 27, 2019 at 13:17
Alright, i will try thank for your help. You still think it is possible to somehow speed up the for loops? Is map() the right way or is there something more elegant? Meanwhile I will change to a bulk insert! — Hannes
– Hannes, Commented Sep 27, 2019 at 13:19

user7661619 · Accepted Answer · 2019-09-27 14:03:54Z

1

This function is inherently slow for large amounts of data due to its complexity (trying every 2-combination of a set). However, you're calculating the 'poly' for the same ids multiple times, even though it seems that you can calculate them only once beforehand (which might be expensive) and store them for later usage. So try to extract the building of the polys.

def getPolyForUniqueId(uid):
    polyData = df[df['id'] == uid]
    poly = build_poly(polyData)
    poly = poly.buffer(0)
    return polyData

def polyData(uniqueIds):
    polyDataList = [getPolyForUniqueId(uid) for uid in uniqueIds]
    for index in range(len(uniqueIds) - 1):
        id_1 = uniqueIds[index]
        poly_1 = polyDataList[index]
        for secondIndex in range(index + 1, len(uniqueIds)):
            id_2 = uniqueIds[secondIndex]
            poly_2 = polyDataList[secondIndex]
            ...

answered Sep 27, 2019 at 14:03

user7661619

Sign up to request clarification or add additional context in comments.

3 Comments

gda2004 Over a year ago

Could this be improved by using the zip function?

Hannes Over a year ago

Alright, that makes a lot of sense. THank you for this idea! I implemented it, however now I get the error: AttributeError: 'DataFrame' object has no attribute 'intersection' It seems like poly_1 and poly_2 are not a polygon anymore? Or did I do something wrong?

Hannes Over a year ago

Ok that works for me, just had to change return polyData into return poly in the getPolyForUniqueId function. Can I also replace the nested for loop with something more speedier like itertools or lambda/map? If so, how? :D

Collectives™ on Stack Overflow

How do I speed up this nested for loop in Python?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related