-
Notifications
You must be signed in to change notification settings - Fork 552
Description
As part of https://github.com/RInstitute/rcpparrayfire I am trying to advertise ArrayFire's capability of producing random numbers real fast. But before doing so I wanted to check the RNGs and ran into problems when doing chi^2 tests with lots of random numbers. The original analysis used R in some parts, but I have recoded it in C++ here:
#include <iostream>
#include <iomanip>
#include <cassert>
#include <arrayfire.h>
// normalized chi^2 statistic for uniform distribution
// multiply with number of samples to get chi^2 statistic
float uchi2(af::array input) {
assert(input.isvector());
double expected = 1.0 / input.elements();
af::array diff = input - expected;
return af::sum<float>((diff * diff) / expected);
}
int main(int argc, char ** argv) {
int backend = argc > 1 ? atoi(argv[1]) : 0;
af::setBackend(static_cast<af::Backend>(backend));
int device = argc > 2 ? atoi(argv[2]) : 0;
af::setDevice(device);
af::info();
double samples = 1e8;
int steps = 300;
int bins = 100;
af::array total_hist = af::constant(0, bins, f32);
af::array step_hist = af::constant(0, bins, f32);
af::setDefaultRandomEngineType(AF_RANDOM_ENGINE_PHILOX);
//af::setDefaultRandomEngineType(AF_RANDOM_ENGINE_THREEFRY);
//af::setDefaultRandomEngineType(AF_RANDOM_ENGINE_MERSENNE);
std::cout << std::setw(4) << "step"
<< std::setw(7) << "chi2_i"
<< std::setw(7) << "chi2_t"
<< std::setprecision(2) << std::fixed << std::endl;
for(int i = 0; i < steps; ++i) {
step_hist = af::histogram(af::randu(samples), bins, 0.0, 1.0) / samples;
total_hist = (total_hist * i + step_hist) / (i + 1);
std::cout << std::setw(4) << i
<< std::setw(7) << uchi2(step_hist) * samples
<< std::setw(7) << uchi2(total_hist) * samples * (i +1)
<< std::endl;
}
}Here I am drawing 10^8 random numbers, calculate a histogram for them and determine the chi^2 statistic. I do this 500 times, collecting the individual histograms to form a total histogram, for which chi^2 statistic is also computed. [side note: I am using this unusual chi^2 for normalized distribution since I am getting some strange but probably unrelated errors when using a more straight forward implementation.]
If I run the program with PHILOX or THREEFRY on the OpenCL backend, I get typically something like this:
$ ./chi2 0 2
ArrayFire v3.5.1 (OpenCL, 64-bit Linux, build 0a675e8)
-0- INTEL: Intel(R) HD Graphics, 12367 MB
-1- BEIGNET: Intel(R) HD Graphics Skylake ULT GT2, 4096 MB
[2] INTEL: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz, 15469 MB
step chi2_i chi2_t
0 119.23 119.23
1 101.52 115.02
2 117.74 126.43
3 89.05 113.38
4 91.91 108.98
5 113.97 104.43
6 126.62 97.98
7 111.21 88.87
[...]
292 107.63 187.05
293 84.17 186.90
294 88.01 185.81
295 79.39 185.76
296 83.14 187.53
297 86.37 188.46
298 101.42 188.98
299 95.11 190.88
In the beginning, both the individual and the total chi^2 are within the expected range (roughly 65 to 140 for 99 degrees of freedom), but at around step 200, the total chi^2 begins to rise reaching values that clearly indicate that the distribution is no longer uniform.
This does not happen if I use PHILOX or THREEFRY on the CPU backend or with MERSENNE on the OpenCL or CPU backend.