[Feature request]: Accelerate bernoulli number generation on CPU 

The bernoulli numbers in current version are generated in series by a single, naive and serial stream. The performance of OPs like Dropout which call the steam is poor on CPU. The evidence of some case shows that the performance on CPU is 250X slower than that on GPU. But the gap should not be that great in consideration of the peak theoretical performance.  
The code shows it is clear that [bernoulli number generation of GPU](https://github.com/pytorch/pytorch/blob/master/aten/src/THC/generic/THCTensorRandom.cu#L414) is not restricted by the only thread. Furtherly, the code also calls the lib of cuda to create random number([curand_uniform_double](https://github.com/pytorch/pytorch/blob/master/aten/src/THC/generic/THCTensorRandom.cu#L400)).

Actually, [Intel-Caffe](https://github.com/intel/caffe) takes advantage of [VSL math lib and openmp](https://github.com/intel/caffe/blob/9f0108f5a82d3a180f9b8837b7b09d72d8af8dcc/src/caffe/util/math_functions.cpp#L424-L457) to do the same work on CPU as that on GPU in PyTorch. We may borrow the code to help PyTorch to get better performance on CPU. However, I also notice that the seed of random number stream of current version could be set mannually. We maybe change the code in Intel-Caffe slightly if necessary.

@cpuhrsch Your advice is important to us because of your effort on CPU. Could you spare some time to look into the part of code? Looking forward to your point.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature request]: Accelerate bernoulli number generation on CPU #6940

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature request]: Accelerate bernoulli number generation on CPU #6940

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions