How to parallelise nested for loops in python

Question

I have this function containing nested loops. I need to parallelise for faster execution of code.

def euclid_distance(X,BOW_X):
     d3=[]
     d2=[]

     for l in range(len(X)):
         for n in range(l+1,len(X)):
             d1=[]
             for m in range(len(X[l])):
                 min1=999
                 p=0
                 while(p<len(X[n])):
                     d=scipy.spatial.distance.euclidean(X[l][m],X[n][p])
                     d=d*numpy.min([BOW_X[l][m],BOW_X[n][p]])
                     if(d<min1):
                         min1=d
                     if(min1==0):
                         break
                     p+=1
                 d1.append(min1)

             d2.append(d1)

     for i in range(len(d2)):
         d3.append(sum(d2[i]))


return (d3)

Is there some way to do this X is an array containing list of lists which contain vectors.

can you give a small example of what is X, BOW_X, C, and expected output? — Dennis Golomazov
– Dennis Golomazov, Commented Oct 21, 2016 at 5:14
X=array( [ [[1,2,3],[2,7,6],[3,0,1]],[[3,3,3],[1,1,1]],[[6,7,5],[9,0,1],[3,7,5],[0,4,4]] ], dtype=object) Description : [1,2,3] --> vector rep of a word [1,2,3],[2,7,6],[3,0,1]] --> document containing 3 words so the array contains 3 documents with 3, 2, 4 words resp BOW_X=array( [ [[0.1,0.02,0.3],[0.2,0.7,0.6],[0.03,0,0.1]],[[0.03,0.3,0.03],[0.1,0.1,0.1]],[[0.6,0.7,0.5],[0.9,0,0.01],[0.3,0.7,0.5],[0,0.4,0.04]] ], dtype=object) Output : I want to calculate minimum distance of each word in one doc from each word of other doc, store the min. and then add them to get dist b/w 2 docs — ajit_gaikwad
– ajit_gaikwad, Commented Oct 25, 2016 at 5:13

Randy Arguelles · Accepted Answer · 2016-10-20 09:56:08Z

0

have you tried using xrange instead of range? it may help for faster execution. (this should be in the comment but i have not yet unlocked it, sorry)

answered Oct 20, 2016 at 9:56

Randy Arguelles

2251 silver badge9 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ajit_gaikwad Over a year ago

That won't help much, as i have huge dataset. I need a way to parallelise it using joblib library

Harsh Wardhan · Accepted Answer · 2016-10-21 06:05:08Z

0

You may use PyCuda if you have a GPU and there is no data dependency between the instructions. With PyCuda you can spawn multiple parallel threads but there will be some overheads for data transfer between the devices, i.e., CPU and GPU.

answered Oct 21, 2016 at 6:05

Harsh Wardhan

2,17810 gold badges38 silver badges54 bronze badges

Collectives™ on Stack Overflow

How to parallelise nested for loops in python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related