Mismatch between Matlab log and Numpy np.log

Question

While rewriting an old Matlab code to NumPy, I noticed differences in logarithmic calculation. In NumPy, I use np.log, Matlab uses log function.

b = [1 1 2 3 5 1 1];
p = b ./ sum(b);
sprintf('log(%.20f) = %.20f', p(5), log(p(5)))

import numpy as np
b = np.array([1, 1, 2, 3, 5, 1, 1])
p = b.astype('float64') / np.sum(b)
print(f'log({p[4]:.20f}) = {np.log(p[4]):.20f}')

For my MacBook Pro 2020 with M1 chip, I get mismatch at 16th decimal digit.

log(0.35714285714285715079) = -1.02961941718115834732  # Matlab
log(0.35714285714285715079) = -1.02961941718115812527  # NumPy

I would like to get exactly the same results. Any idea, how to modify my Python code?

Does that difference really matter for your specific application? Keep in mind that neither of those numbers are actually correct. As per WolframAlpha, ln(5/14) = -1.029619417181158 239921825531675168658... — jared
– jared, Commented Dec 10, 2023 at 20:41
For practical applications doing a single log call no. But for research, this is important. What if the imprecisions cumulate, and the results deviate even more? Imagine a deep neural network with a log-based activation function, which learns something else, when implemented in Python or Matlab. — Martin Benes
– Martin Benes, Commented Dec 10, 2023 at 20:56
It's difficult to say how Matlab gets its value without seeing the implementation. Using the bilinear expansion in Python produces the same result as numpy, so they might be using that (I don't know where in the numpy source code it is implemented). I computed the natural log in c and that gave another value, -1.029619417181158 21776. The result is inconsistent between languages, which is unsurprising. — jared
– jared, Commented Dec 10, 2023 at 21:31
“What if the imprecisions cumulate, and the results deviate even more?” This is why you need numerically stable algorithms, so that rounding errors don’t accumulate. Scientific computing is a field of research for a reason. “ Imagine a deep neural network…” DL will never depend on the 16th digit. DL is usually implemented with 16-bit or even 8-bit floats, speed and energy efficiency are more important than precision. — Cris Luengo
– Cris Luengo, Commented Dec 10, 2023 at 22:28
My colleagues present a paper on the impact of numerical deviations on ML at NEURIPS. It is not completely out of topic. informationsecurity.uibk.ac.at/pdfs/SHB2023_NEURIPS.pdf — Martin Benes
– Martin Benes, Commented Dec 11, 2023 at 8:41

flawr · Accepted Answer · 2023-12-10 21:49:02Z

5

Both MATLAB and numpy by default use 64bit float that have a 52 bit mantissa. This means the smallest relative step between two float64 numbers is 2**-52 = 2.2e-16. This means any decimal after the 16th has no significance. The difference you're seeing is probably due to a sligtly different implementation. You can check this by using

np.nextafter(a, 1)-a

For a = np.log(0.35714285714285715079) you get 2.2e-16 which is roughly the size of the machine precision np.finfo(np.float64).eps.

Even if you look at the input: You're providing more decimals than necessary to completely define a 64-bit float. We can set the number of displaye decimals to 100 and it will still just print 17 digits for this reason:

>>> np.set_printoptions(precision=100)
>>> np.array([0.35714285714285715079])
array([0.35714285714285715])

The difference between MATLAB and numpy might even be caused by reordering a sum, as floating point addition is not associative. If you do depend on the 16th decimal place then you should use something other than 64bit float. I'd recommend familiarizing yourself with how floating point types are implemented, as it is vital when working with scientific software. And if you'd like, I'd recommend taking a look at the source code of numpy, to see how it is implemented, and compare it to other open libraries.

answered Dec 10, 2023 at 21:49

flawr

11.7k4 gold badges49 silver badges83 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Luis Mendo Over a year ago

Related to this answer, it is worth noting that Matlab's sum was recently changed to better deal with round-off errors

Homer512 Over a year ago

And likewise np.sum is inferior to math.fsum if you want the last bit of precision

Collectives™ on Stack Overflow

Mismatch between Matlab log and Numpy np.log

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related