Python Numpy Data Types Performance - Stack Overflow most recent 30 from stackoverflow.com 2026-04-14T08:41:13Z https://stackoverflow.com/feeds/question/15340781 https://creativecommons.org/licenses/by-sa/4.0/rdf https://stackoverflow.com/q/15340781 30 Python Numpy Data Types Performance Gonzo https://stackoverflow.com/users/598794 2013-03-11T14:13:07Z 2013-03-11T14:44:23Z <p>So I did some testing and got odd results.</p> <p>Code:</p> <pre><code>import numpy as np import timeit setup = """ import numpy as np A = np.ones((1000,1000,3), dtype=datatype) """ datatypes = "np.uint8", "np.uint16", "np.uint32", "np.uint64", "np.float16", "np.float32", "np.float64" stmt1 = """ A = A * 255 A = A / 255 A = A - 1 A = A + 1 """ #~ np.uint8 : 1.04969205993 #~ np.uint16 : 1.19391073202 #~ np.uint32 : 1.37279821351 #~ np.uint64 : 2.99286961148 #~ np.float16 : 9.62375889588 #~ np.float32 : 0.884994368045 #~ np.float64 : 0.920502625252 stmt2 = """ A *= 255 A /= 255 A -= 1 A += 1 """ #~ np.uint8 : 0.959514497259 #~ np.uint16 : 0.988570167659 #~ np.uint32 : 0.963571471946 #~ np.uint64 : 2.07768933333 #~ np.float16 : 9.40085450056 #~ np.float32 : 0.882363984225 #~ np.float64 : 0.910147440048 stmt3 = """ A = A * 255 / 255 - 1 + 1 """ #~ np.uint8 : 1.05919667881 #~ np.uint16 : 1.20249978404 #~ np.uint32 : 1.58037744789 #~ np.uint64 : 3.47520357571 #~ np.float16 : 10.4792515701 #~ np.float32 : 1.29654744484 #~ np.float64 : 1.80735079168 stmt4 = """ A[:,:,:2] *= A[:,:,:2] """ #~ np.uint8 : 1.23270964172 #~ np.uint16 : 1.3260807837 #~ np.uint32 : 1.32571002402 #~ np.uint64 : 1.76836543305 #~ np.float16 : 2.83364821535 #~ np.float32 : 1.31282323872 #~ np.float64 : 1.44151875479 stmt5 = """ A[:,:,:2] = A[:,:,:2] * A[:,:,:2] """ #~ np.uint8 : 1.38166223494 #~ np.uint16 : 1.49569114821 #~ np.uint32 : 1.53105315419 #~ np.uint64 : 2.03457943366 #~ np.float16 : 3.01117795524 #~ np.float32 : 1.51807271679 #~ np.float64 : 1.7164808877 stmt6 = """ A *= 4 A /= 4 """ #~ np.uint8 : 0.698176392658 #~ np.uint16 : 0.709560468038 #~ np.uint32 : 0.701653066443 #~ np.uint64 : 1.64199069295 #~ np.float16 : 4.86752675499 #~ np.float32 : 0.421001675475 #~ np.float64 : 0.433056710408 stmt7 = """ np.left_shift(A, 2, A) np.right_shift(A, 2, A) """ #~ np.uint8 : 0.381521115341 #~ np.uint16 : 0.383545967785 #~ np.uint32 : 0.386147272415 #~ np.uint64 : 0.665969478824 for stmt in [stmt1, stmt2, stmt3, stmt4, stmt5, stmt6, stmt7]: print stmt for d in datatypes: s = setup.replace("datatype", d) T = timeit.Timer(stmt=stmt, setup=s) print d,":", min(T.repeat(number=30)) print print </code></pre> <p>Why is float16 so slow? Why is float32 so fast? It is often faster than integer ops.</p> <p>If you have any related performance tips I would be glad to hear them.</p> <p>This is python 2.6.6 32bit on windows 8 64bit. Numbers for Numpy 1.6, Numpy 1.7 similar. Will test MKL optimized version now: <a href="http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy">http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy</a></p> <p>edit: turns out the MKL version is slightly faster in some floating point cases but sometimes lots slower for integer ops:</p> <pre><code>stmt2 = """ A *= 255 A /= 255 A -= 1 A += 1 """ #np1.6 #~ np.uint8 : 0.959514497259 #~ np.uint16 : 0.988570167659 #~ np.uint32 : 0.963571471946 #~ np.uint64 : 2.07768933333 #~ np.float16 : 9.40085450056 #~ np.float32 : 0.882363984225 #~ np.float64 : 0.910147440048 # np1.7 #~ np.uint8 : 0.979 #~ np.uint16 : 1.010 #~ np.uint32 : 0.972 #~ np.uint64 : 2.081 #~ np.float16 : 9.362 #~ np.float32 : 0.882 #~ np.float64 : 0.918 # np1.7 mkl #~ np.uint8 : 1.782 #~ np.uint16 : 1.145 #~ np.uint32 : 1.265 #~ np.uint64 : 2.088 #~ np.float16 : 9.029 #~ np.float32 : 0.800 #~ np.float64 : 0.866 </code></pre> https://stackoverflow.com/questions/15340781/-/15341193#15341193 29 Answer by Bálint Aradi for Python Numpy Data Types Performance Bálint Aradi https://stackoverflow.com/users/1859258 2013-03-11T14:33:43Z 2013-03-11T14:33:43Z <p>Half precision arithmetic (float16) is something which must be "emulated" by numpy I guess, as there are no corresponding types in the underlying C language (and in the appropriate processor instructions) for it. On the other hand, single precision (float32) and double precision (float64) operations can be done very efficiently using native data types.</p> <p>As of the good performance for single precision operations: Modern processors have efficient units for vectorized floating point arithmetics (e.g. AVX) as it is also needed for good multimedia performance.</p> https://stackoverflow.com/questions/15340781/-/15341303#15341303 12 Answer by user395760 for Python Numpy Data Types Performance user395760 https://stackoverflow.com/users/0 2013-03-11T14:38:39Z 2013-03-11T14:38:39Z <p>16 bit floating point numbers are not supports by most common CPUs directly (though graphics card vendors are apparently involved in this data type, so I expect GPUs to support it eventually). I expect them to be emulated, in a comparatively slow way. Google tells me that <a href="http://mail.scipy.org/pipermail/numpy-discussion/2008-January/030672.html">float16 was once hardware-dependent</a> and some people wanted to emulate it for hardware that doesn't support it, though I didn't find anything on whether that actually happened.</p> <p>32 bit floats, on the other hand, are not only supported natively, you can also vectorize many operations on them with SIMD instruction set extensions, which drastically reduces the overhead for the kind of operation you benchmark. The exception is shuffling data around, but in that case, float32 is on par with int32 and both can use the same SIMD instructions to load and store larger blocks of memory.</p> <p>While there are also SIMD instructions for integer math, they are less common (e.g. SEE introduced them in a later version than the float versions) and often less sophisticated. My guess is that (your build of) NumPy doesn't have SIMD implementations of the operations that are slower for you. Alternatively, the integer operations may not be as optimized: Floats are used in many easy-to-vectorize applications whose performance matters a lot (e.g. image/media/video en- and decoding), so they may be more optimized.</p>