I just bumped into some weird performance "issue"/"gain" with python3. The following code loads 5 weight matrices and applies them to a fairly large dataset. While doing so it writes each row out to disk.
When I execute this program, all eight processors are occupied for a 100% by it.
Does python automatically executes a programs on multiple threads ? If so, is there any documentation on this ? If not, how can it be that this program consumes all 8 processors on an octcore ?
#!/usr/bin/python3
import numpy
import struct
from scipy.special import expit
from dA import load_data
from dA import load_wb
import sys
if __name__ == '__main__':
stages=[2223,723,172,84,21]
wb=[]
for stage in stages:
w,b=load_wb("%d" % (stage))
print(numpy.max(w))
wb.append((w,b))
data=load_data()
n=data.shape[0]
dimensions=stages[-1]
filename="%d.data" % (dimensions)
chunk=">"+('f'*dimensions)
with open(filename,"wb") as f:
for i in range(n):
row=data[i]
for (w, b) in wb:
row=2*expit(2*(numpy.dot(row,w)+b))-1
s=struct.pack(chunk,*row)
f.write(s)
dot. numpy releasing GIL only means that other threads in python can run while numpy calculates the result (parallelized or not). When returning to python the GIL is reacqured. I mentioned the GIL only to emphasize that pure python code does not generally run in parallel, numpy running in parallel is not really related to that.