Fixes to mean and better precision on sum (cpu)#3687
Fixes to mean and better precision on sum (cpu)#3687willyborn wants to merge 2 commits intoarrayfire:masterfrom
Conversation
|
Thanks for this willyborn. FYI, our Buildbot CI system failed to build the PR with the following error:
|
b74f28f to
b449cbf
Compare
|
I'm on ubuntu 22.04LTS with GCC, and I can not reproduce any warning/error you mentioned. PS: Just to be certain I performed the same change for all platforms. |
|
This is happening when NVCC is compiling the stable_mean CUDA device kernel. Which version of CUDA are you using? Could you try with 12.9 |
b449cbf to
4a8dca4
Compare
|
I am using CUDA 12.9.86 on ubuntu 24.04.3 LTS kernel 6.14. As soon as I included all major architectures to the nvcc options, I got the error also, which is now also fixed and committed. |
When strided weights are used, the mean function typically returned random results.
The produced result of mean is no longer dependent on platform or processing style (linear, strided, on CPU or GPU)
Description
b74f28f sum on CPU now has the same precision as other platforms
With this introduction, the difference in result of a linear array and strided array is minimal. The precision in the tests are as a result improved.
As result, the precision of many tests can be improved now since all platforms generate similar results, independent from parallel or serial processing.
9307d6f Fixes to mean, on all platforms
All:
With this introduction, the difference in result of a linear array and strided array is minimal. The precision in the tests are as a result improved.
CPU:
CUDA:
ONEAPI:
OPENCL:
TEST:
- extra tests added on all temporary formats.
- allowed fault tolerance for tests is lowered, since now the correct mean is calculated for all backends.
- MeanAllTest now uses random data iso constants for testing. This blocked the detection of partial processing of the input.
Additional information about the PR answering following questions:
Fixes:
Changes to Users
Better precision in all circumstances, and independent form platform or array format.
Checklist