I have a 2D cost matrix M, perhaps 400x400, and I'm trying to calculate the optimal path through it. As such, I have a function like:
M[i,j] = M[i,j] + min(M[i-1,j-1],M[i-1,j]+P1,M[i,j-1]+P1)
which is obviously recursive. P1 is some additive constant. My code, which works more or less, is:
def optimalcost(cost, P1=10):
width1,width2 = cost.shape
M = array(cost)
for i in range(0,width1):
for j in range(0,width2):
try:
M[i,j] = M[i,j] + min(M[i-1,j-1],M[i-1,j]+P1,M[i,j-1]+P1)
except:
M[i,j] = inf
return M
Now I know looping in Numpy is a terrible idea, and for things like the calculation of the initial cost matrix I've been able to find shortcuts to cutting the time down. However, as I need to evaluate potentially the entire matrix I'm not sure how else to do it. This takes around 3 seconds per call on my machine and must be applied to around 300 of these cost matrices. I'm not sure where this time comes from, as profiling says the 200,000 calls to min only take 0.1s - maybe memory access?
Is there a way to do this in parallel somehow? I assume there may be, but to me it seems each iteration is dependent unless there's a smarter way to memoize things.
There are parallels to this question: Can I avoid Python loop overhead on dynamic programming with numpy?
I'm happy to switch to C if necessary, but I like the flexibility of Python for rapid testing and the lack of faff with file IO. Off the top of my head, is something like the following code likely to be significantly faster?
#define P1 10
void optimalcost(double** costin, double** costout){
/*
We assume that costout is initially
filled with costin's values.
*/
float a,b,c,prevcost;
for(i=0;i<400;i++){
for(j=0;j<400;j++){
a = prevcost+P1;
b = costout[i][j-1]+P1;
c = costout[i-1][j-1];
costout[i][j] += min(prevcost,min(b,c));
prevcost = costout[i][j];
}
}
}
return;
Update:
I'm on Mac, and I don't want to install a whole new Python toolchain so I used Homebrew.
> brew install llvm --rtti
> LLVM_CONFIG_PATH=/usr/local/opt/llvm/bin/llvm-config pip install llvmpy
> pip install numba
New "numba'd" code:
from numba import autojit, jit
import time
import numpy as np
@autojit
def cost(left, right):
height,width = left.shape
cost = np.zeros((height,width,width))
for row in range(height):
for x in range(width):
for y in range(width):
cost[row,x,y] = abs(left[row,x]-right[row,y])
return cost
@autojit
def optimalcosts(initcost):
costs = zeros_like(initcost)
for row in range(height):
costs[row,:,:] = optimalcost(initcost[row])
return costs
@autojit
def optimalcost(cost):
width1,width2 = cost.shape
P1=10
prevcost = 0.0
M = np.array(cost)
for i in range(1,width1):
for j in range(1,width2):
M[i,j] += min(M[i-1,j-1],prevcost+P1,M[i,j-1]+P1)
prevcost = M[i,j]
return M
prob_size = 400
left = np.random.rand(prob_size,prob_size)
right = np.random.rand(prob_size,prob_size)
print '---------- Numba Time ----------'
t = time.time()
c = cost(left,right)
optimalcost(c[100])
print time.time()-t
print '---------- Native python Time --'
t = time.time()
c = cost.py_func(left,right)
optimalcost.py_func(c[100])
print time.time()-t
It's interesting writing code in Python that is so un-Pythonic. Note for anyone interested in writing Numba code, you need to explicitly express loops in your code. Before, I had the neat Numpy one-liner,
abs(left[row,:][:,newaxis] - right[row,:])
to calculate the cost. That took around 7 seconds with Numba. Writing out the loops properly gives 0.5s.
It's an unfair comparison to compare it to native Python code, because Numpy can do that pretty quickly, but:
Numba compiled: 0.509318113327s
Native: 172.70626092s
I'm impressed both by the numbers and how utterly simple the conversion is.
i = 0,j = 0, so you getM[0, 0] = M[0, 0] + min(M[-1, -1], M[-1, 0] + P1, M[0, -1] + P1). It seems to me that yourtrywas trying to catch indices out of bounds (you should, by the way, make explicit what you are trying to catch, i.e. doexcept IndexError), but the-1s in your indices are taken to mean "last element along that dimension" so nothing ever gets set tonp.inf. I don't think that's what you want, but please confirm.autojit.numba. Just what I was looking for. I've taken the liberty of correcting indentation and few other things in the "numba'd" code.