Skip to content

[Feature request]: Accelerate bernoulli number generation on CPU  #6940

@MlWoo

Description

@MlWoo

The bernoulli numbers in current version are generated in series by a single, naive and serial stream. The performance of OPs like Dropout which call the steam is poor on CPU. The evidence of some case shows that the performance on CPU is 250X slower than that on GPU. But the gap should not be that great in consideration of the peak theoretical performance.
The code shows it is clear that bernoulli number generation of GPU is not restricted by the only thread. Furtherly, the code also calls the lib of cuda to create random number(curand_uniform_double).

Actually, Intel-Caffe takes advantage of VSL math lib and openmp to do the same work on CPU as that on GPU in PyTorch. We may borrow the code to help PyTorch to get better performance on CPU. However, I also notice that the seed of random number stream of current version could be set mannually. We maybe change the code in Intel-Caffe slightly if necessary.

@cpuhrsch Your advice is important to us because of your effort on CPU. Could you spare some time to look into the part of code? Looking forward to your point.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions