-
Notifications
You must be signed in to change notification settings - Fork 552
Fixes random number generator with normal distribution #2980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* boxMuller for float randn was being passed incorrect portions of the counter/seed
The Chi2 tests measure the quality of the random number generator. This test is too expensive to run on the CPU
09bbcdf to
71aff29
Compare
|
Concerning the chi-squared test: That is also possible for the normal distribution:
(source: https://cran.r-project.org/web/packages/RcppZiggurat/vignettes/RcppZiggurat.pdf) I have tried to code this as follows: #include <iostream>
#include <iomanip>
#include <cassert>
#include <arrayfire.h>
#include <af/traits.hpp>
using af::array;
using af::randomEngine;
using af::randomEngineType;
template <typename T>
T chi2_statistic(array input, array expected) {
expected *= af::sum<T>(input) / af::sum<T>(expected);
array diff = input - expected;
return af::sum<T>((diff * diff) / expected);
}
// should be used only for x <= 5 (roughly)
array cnd(array x)
{
return 0.5 * erfc(- x * sqrt(0.5));
}
template <typename T>
bool testRandomEngineNormalChi2(randomEngineType type)
{
af::dtype ty = (af::dtype)af::dtype_traits<T>::af_type;
int elem = 1024*1024;
int steps = 256*32;
int bins = 100;
T lower_edge = -7.0;
T upper_edge = 7.0;
array total_hist = af::constant(0.0, 2 * bins, ty);
array edges = af::seq(bins + 1)/bins * lower_edge;
array expected = -af::diff1(cnd(edges));
expected = af::join(0, expected(af::seq(bins-1, 0, -1)), expected);
af::randomEngine r(type, 0);
// R> qchisq(c(5e-6, 1 - 5e-6), 197)
// [1] 121.3197 297.2989
T lower = 121.3197;
T upper = 297.2989;
bool prev_step = true;
bool prev_total = true;
af::setSeed(0x76fa214467690e3c);
std::cout << std::setw(4) << "step"
<< std::setw(7) << "chi2_i"
<< std::setw(7) << "chi2_t"
<< std::setprecision(2) << std::fixed << std::endl;
for (int i = 0; i < steps; ++i) {
array step_hist = af::histogram(af::randn(elem, ty, r), 2 * bins, lower_edge, upper_edge);
T step_chi2 = chi2_statistic<T>(step_hist, expected);
std::cout << std::setw(4) << i
<< std::setw(7) << step_chi2;
bool step = step_chi2 > lower && step_chi2 < upper;
if (!(step || prev_step)) return false;
prev_step = step;
total_hist += step_hist;
T total_chi2 = chi2_statistic<T>(total_hist, expected);
std::cout << std::setw(7) << total_chi2
<< std::endl;
bool total = total_chi2 > lower && total_chi2 < upper;
if (!(total || prev_total)) return false;
prev_total = total;
}
return true;
}
int main(int argc, char ** argv) {
int backend = argc > 1 ? atoi(argv[1]) : 0;
af::setBackend(static_cast<af::Backend>(backend));
int device = argc > 2 ? atoi(argv[2]) : 0;
af::setDevice(device);
af::info();
std::cout << "PHILOX" << std::endl;
if (!testRandomEngineNormalChi2<float>(AF_RANDOM_ENGINE_PHILOX))
std::cout << "FAILURE" << std::endl;
std::cout << "THREEFRY" << std::endl;
if (!testRandomEngineNormalChi2<float>(AF_RANDOM_ENGINE_THREEFRY))
std::cout << "FAILURE" << std::endl;
std::cout << "MERSENNE" << std::endl;
if (!testRandomEngineNormalChi2<float>(AF_RANDOM_ENGINE_MERSENNE))
std::cout << "FAILURE" << std::endl;
if (af::isDoubleAvailable(device)) {
std::cout << "PHILOX" << std::endl;
if (!testRandomEngineNormalChi2<double>(AF_RANDOM_ENGINE_PHILOX))
std::cout << "FAILURE" << std::endl;
std::cout << "THREEFRY" << std::endl;
if (!testRandomEngineNormalChi2<double>(AF_RANDOM_ENGINE_THREEFRY))
std::cout << "FAILURE" << std::endl;
std::cout << "MERSENNE" << std::endl;
if (!testRandomEngineNormalChi2<double>(AF_RANDOM_ENGINE_MERSENNE))
std::cout << "FAILURE" << std::endl;
}
}Getting the Anyway, for me this fails very quickly I am not sure if this indicates an error in my code or is an example of the problem you have identified here. I have not tried to build a patched version. |
|
@rstub This is great. I will work with this and see if I can improve the results. I am able to go further(fails after 13 steps) with the latest changes but not by much. Thanks for your input! |
bec40dc to
e8a1e62
Compare
|
@rstub I pushed my latest changes to this branch and the normal distributions look better. The issue was with the box muller transform function we had been using to generate the normal distribution. For floats we were incorrectly passing in the wrong arguments and there was a subtle bug in one of the conversion methods. I made a couple changes to the test you posed above and I would appreciate if you could run and review the tests. @willat343 Could you also give this branch a try and give me any feedback on the results? |
|
I have checked out the branch. Works very well! I have only tested with BTW, good that you switched back to |
|
Thanks @rstub. I was curious about your choice for the dof as well. Your explanation makes sense. Can you give me your reasoning behind the probability values you chose for that function. I am trying to figure out if those values apply to other types like f16 or if they need to be modified for the limited precision of that type. The actual distribution for f16 looks reasonable when you print out the histogram of the random number array but quickly fails the chi test. I suspect it is because of the reduced resolution of the type. |
9f96a9f to
036470f
Compare
|
@umar: The range of chi^2 is just a heuristic. Large enough to not trigger spurious failures in the cases I looked at but small enough to catch the errors identified at that time (c.f. #2007). Have you tried running the test for |
3900496 to
6aab251
Compare
Move the RNG quality checks to a separate file to avoid memory allocation failures because the random tests are running parallely with other tests. These tests are marked as SERIAL so they are running serially.
* Fix rounding of some operations by using fused operations like sincospi and fma instead of a multiply add. * Convert half constants to hex values and use __ushort_as_half to avoid redundant conversions from float * Pass integers instead of pointers in the OpenCL backend of the rng functions
The randn function was generating low quality floating point random numbers. This PR fixes and issue with the Box Muller parameters for floating point numbers.
Description
This bug fix addresses duplicate values when generating random numbers with floats. This issue was caused because the boxMullerTransform function was passed incorrect values for the second set of 2 floating point numbers.
This PR also enables the Chi squared tests for random number generators. These are heavier tests but we should start testing the quality to avoid issues like this. We are unfortunately not testing the randn function but its a step in the right direction. @rstub How difficult would it be to implement a similar test for the normal distribution.
Fixes: #2972
Changes to Users
N/A
Checklist
`[ ] Functions added to unified API[ ] Functions documented