0

I have this function containing nested loops. I need to parallelise for faster execution of code.

def euclid_distance(X,BOW_X):
     d3=[]
     d2=[]

     for l in range(len(X)):
         for n in range(l+1,len(X)):
             d1=[]
             for m in range(len(X[l])):
                 min1=999
                 p=0
                 while(p<len(X[n])):
                     d=scipy.spatial.distance.euclidean(X[l][m],X[n][p])
                     d=d*numpy.min([BOW_X[l][m],BOW_X[n][p]])
                     if(d<min1):
                         min1=d
                     if(min1==0):
                         break
                     p+=1
                 d1.append(min1)

             d2.append(d1)

     for i in range(len(d2)):
         d3.append(sum(d2[i]))


return (d3)

Is there some way to do this X is an array containing list of lists which contain vectors.

3
  • can you give a small example of what is X, BOW_X, C, and expected output? Commented Oct 21, 2016 at 5:14
  • X=array( [ [[1,2,3],[2,7,6],[3,0,1]],[[3,3,3],[1,1,1]],[[6,7,5],[9,0,1],[3,7,5],[0,4,4]] ], dtype=object) Description : [1,2,3] --> vector rep of a word [1,2,3],[2,7,6],[3,0,1]] --> document containing 3 words so the array contains 3 documents with 3, 2, 4 words resp BOW_X=array( [ [[0.1,0.02,0.3],[0.2,0.7,0.6],[0.03,0,0.1]],[[0.03,0.3,0.03],[0.1,0.1,0.1]],[[0.6,0.7,0.5],[0.9,0,0.01],[0.3,0.7,0.5],[0,0.4,0.04]] ], dtype=object) Output : I want to calculate minimum distance of each word in one doc from each word of other doc, store the min. and then add them to get dist b/w 2 docs Commented Oct 25, 2016 at 5:13
  • Revert back in case of clarification required Commented Oct 25, 2016 at 5:18

2 Answers 2

0

have you tried using xrange instead of range? it may help for faster execution. (this should be in the comment but i have not yet unlocked it, sorry)

Sign up to request clarification or add additional context in comments.

1 Comment

That won't help much, as i have huge dataset. I need a way to parallelise it using joblib library
0

You may use PyCuda if you have a GPU and there is no data dependency between the instructions. With PyCuda you can spawn multiple parallel threads but there will be some overheads for data transfer between the devices, i.e., CPU and GPU.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.