2

the function shown below is running quite slow even though I used swifter to call it. Does anyone know how to speed this up? My python knowledge is limited at this point and I would appreciate any help I could get. I tried using map() function but somehow it didnt work for me. I guess the nested for loop makes it rather slow, right?

BR, Hannes

def polyData(uniqueIds):
    for index in range(len(uniqueIds) - 1):
        element = uniqueIds[index]
        polyData1 = df[df['id'] == element]
        poly1 = build_poly(polyData1)
        poly1 = poly1.buffer(0)
        for secondIndex in range(index + 1, len(uniqueIds)):
            otherElement = uniqueIds[secondIndex]
            polyData2 = df[df['id'] == otherElement]
            poly2 = build_poly(polyData2)
            poly2 = poly2.buffer(0)
# Calculate overlap percentage wise
            overlap_pct = poly1.intersection(poly2).area/poly1.area
# Form new DF
            df_ol = pd.DataFrame({'id_1':[element],'id_2':[otherElement],'overlap_pct':[overlap_pct]})
# Write to SQL database
            df_ol.to_sql(name='df_overlap', con=e,if_exists='append',index=False)
4
  • I think that more than improve the python loop you may want to do a single query that updates all the objects in the DB, since I'm quite sure that the bottleneck is the DB operation. I do not know the SQL wrapper you used here but I'm quite sure it can be done with any SQL frontend! Commented Sep 27, 2019 at 12:59
  • hm you think so? I use sqlalchemy: e = create_engine('sqlite:///') And what is the best way to do this? Commented Sep 27, 2019 at 13:01
  • Try to google for pandas bulk insert, you'll find a lot of interesting stuff. Commented Sep 27, 2019 at 13:17
  • Alright, i will try thank for your help. You still think it is possible to somehow speed up the for loops? Is map() the right way or is there something more elegant? Meanwhile I will change to a bulk insert! Commented Sep 27, 2019 at 13:19

1 Answer 1

1

This function is inherently slow for large amounts of data due to its complexity (trying every 2-combination of a set). However, you're calculating the 'poly' for the same ids multiple times, even though it seems that you can calculate them only once beforehand (which might be expensive) and store them for later usage. So try to extract the building of the polys.

def getPolyForUniqueId(uid):
    polyData = df[df['id'] == uid]
    poly = build_poly(polyData)
    poly = poly.buffer(0)
    return polyData

def polyData(uniqueIds):
    polyDataList = [getPolyForUniqueId(uid) for uid in uniqueIds]
    for index in range(len(uniqueIds) - 1):
        id_1 = uniqueIds[index]
        poly_1 = polyDataList[index]
        for secondIndex in range(index + 1, len(uniqueIds)):
            id_2 = uniqueIds[secondIndex]
            poly_2 = polyDataList[secondIndex]
            ...
Sign up to request clarification or add additional context in comments.

3 Comments

Could this be improved by using the zip function?
Alright, that makes a lot of sense. THank you for this idea! I implemented it, however now I get the error: AttributeError: 'DataFrame' object has no attribute 'intersection' It seems like poly_1 and poly_2 are not a polygon anymore? Or did I do something wrong?
Ok that works for me, just had to change return polyData into return poly in the getPolyForUniqueId function. Can I also replace the nested for loop with something more speedier like itertools or lambda/map? If so, how? :D

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.