Skip to content

RNG quality with OpenCL #2007

@rstub

Description

@rstub

As part of https://github.com/RInstitute/rcpparrayfire I am trying to advertise ArrayFire's capability of producing random numbers real fast. But before doing so I wanted to check the RNGs and ran into problems when doing chi^2 tests with lots of random numbers. The original analysis used R in some parts, but I have recoded it in C++ here:

#include <iostream>
#include <iomanip>
#include <cassert>

#include <arrayfire.h>

// normalized chi^2 statistic for uniform distribution
// multiply with number of samples to get chi^2 statistic
float uchi2(af::array input) {
    assert(input.isvector());
    double expected = 1.0 / input.elements();
    af::array diff = input - expected;
    return af::sum<float>((diff * diff) / expected);
}

int main(int argc, char ** argv) {
    int backend = argc > 1 ? atoi(argv[1]) : 0;
    af::setBackend(static_cast<af::Backend>(backend));
    int device = argc > 2 ? atoi(argv[2]) : 0;
    af::setDevice(device);
    af::info();

    double samples = 1e8;
    int steps = 300;
    int bins = 100;

    af::array total_hist = af::constant(0, bins, f32);
    af::array step_hist = af::constant(0, bins, f32);

    af::setDefaultRandomEngineType(AF_RANDOM_ENGINE_PHILOX);
    //af::setDefaultRandomEngineType(AF_RANDOM_ENGINE_THREEFRY);
    //af::setDefaultRandomEngineType(AF_RANDOM_ENGINE_MERSENNE);
    std::cout << std::setw(4) << "step"
              << std::setw(7) << "chi2_i"
              << std::setw(7) << "chi2_t"
              << std::setprecision(2) << std::fixed << std::endl;
    for(int i = 0; i < steps; ++i) {
        step_hist = af::histogram(af::randu(samples), bins, 0.0, 1.0) / samples;
        total_hist = (total_hist * i + step_hist) / (i + 1);
        std::cout << std::setw(4) << i
                  << std::setw(7) << uchi2(step_hist) * samples
                  << std::setw(7) << uchi2(total_hist) * samples * (i +1)
                  << std::endl;
    }
}

Here I am drawing 10^8 random numbers, calculate a histogram for them and determine the chi^2 statistic. I do this 500 times, collecting the individual histograms to form a total histogram, for which chi^2 statistic is also computed. [side note: I am using this unusual chi^2 for normalized distribution since I am getting some strange but probably unrelated errors when using a more straight forward implementation.]

If I run the program with PHILOX or THREEFRY on the OpenCL backend, I get typically something like this:

$ ./chi2 0 2
ArrayFire v3.5.1 (OpenCL, 64-bit Linux, build 0a675e8)
-0- INTEL: Intel(R) HD Graphics, 12367 MB
-1- BEIGNET: Intel(R) HD Graphics Skylake ULT GT2, 4096 MB
[2] INTEL: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz, 15469 MB
step chi2_i chi2_t
   0 119.23 119.23
   1 101.52 115.02
   2 117.74 126.43
   3  89.05 113.38
   4  91.91 108.98
   5 113.97 104.43
   6 126.62  97.98
   7 111.21  88.87
[...]
 292 107.63 187.05
 293  84.17 186.90
 294  88.01 185.81
 295  79.39 185.76
 296  83.14 187.53
 297  86.37 188.46
 298 101.42 188.98
 299  95.11 190.88

In the beginning, both the individual and the total chi^2 are within the expected range (roughly 65 to 140 for 99 degrees of freedom), but at around step 200, the total chi^2 begins to rise reaching values that clearly indicate that the distribution is no longer uniform.

This does not happen if I use PHILOX or THREEFRY on the CPU backend or with MERSENNE on the OpenCL or CPU backend.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions