implement gamma cuda #6855

t-vi · 2018-04-23T08:04:52Z

Builds on previous PR by Rachit Singh Implement CUDA-based gamma sampling #4839
Thank you!

Things I could use feedback on (in addition to all the needs for improvement you spot):

I accidentally moved digamma to ATen native before noticing that it was available for CUDA and CPU in TH/THC. Should I expose the new digamma and drop the old or should I back out the new digamma and use TH/THC?
My impression is that the digamma half implementation previously used float for intermediate results. I do not do this yet. Should I?
One of the possion tests seems to start to fail, but I'm not entirely sure what I have changed to effect that.
I'm not super happy about the ifdef's I needed to get the code to work in Distributions.h on CUDA and CPU.

…ompiles

…precision

apaszke

LGTM. Two minor nits. Did you need to pull in any new implementations of those math functions, or are they just coming from other places in our codebase?

aten/src/ATen/native/cuda/Distributions.cu

+          return curand_normal(&state);
+        });
+	auto sample = sample_gamma<float>(alpha, standard_uniform, standard_normal);
+	ret_val = ::max(THCNumerics<scalar_t>::min(), scalar_cast<scalar_t>(sample));


aten/src/ATen/native/Distributions.cpp

+          return THRandom_normal(generator, 0.0, 1.0);
+        });
+        auto sample = sample_gamma<double>(alpha, standard_uniform, standard_normal);
+	ret_val = std::max(std::numeric_limits<scalar_t>::min(), (scalar_t) sample);


t-vi · 2018-04-23T18:47:06Z

Regarding new functions: No. I just moved stuff around from aten/src/TH and aten/src/ATen/native/Distributions.cpp. The copyright notice was from Rachit's patch, but the code below it actually was in Distributions.cpp before.

fritzo

Thanks for implementing this!

aten/src/ATen/native/Distributions.h

+  }
+
+  // Use a Rice saddle point expansion for large alpha.
+  if (alpha > 8.0f) {


aten/src/ATen/native/Distributions.h

+// Computes the reparameterized gradient -(d/dalpha cdf(x;alpha)) / pdf(x;alpha)
+// for random number x drawn from a standard Gamma distribution Gamma(alpha).
+template <typename scalar_t>
+deviceforcuda scalar_t standard_gamma_grad_one(scalar_t alpha, scalar_t x) {


aten/src/ATen/native/Distributions.h

+
+  // Boost alpha for higher acceptance probability.
+  if (alpha < 1.0) {
+    scale *= std::pow(1 - standard_uniform.sample(), static_cast<scalar_t>(1.0) / alpha);


rachtsingh · 2018-04-24T02:37:46Z

It looks good to me! I too am not terribly happy with the #ifdef situation but it does work - thanks for following up on Adam's suggestion and getting this finished. The build failure looks spurious?

aten/src/ATen/native/Distributions.h

+
+  // This implements the acceptance-rejection method of Marsaglia and Tsang (2000)
+  // doi:10.1145/358407.358414
+  const scalar_t d = alpha - 1.0 / 3.0;


aten/src/ATen/native/Distributions.h

+ */
+template <typename scalar_t>
+deviceforcuda static inline scalar_t digamma_one(scalar_t x) {
+  using acc_scalar_t = typename std::conditional<std::is_same<scalar_t, at::Half>::value, float, scalar_t>::type;


aten/src/ATen/native/Distributions.h

+// Computes the reparameterized gradient -(d/dalpha cdf(x;alpha)) / pdf(x;alpha)
+// for random number x drawn from a standard Gamma distribution Gamma(alpha).
+template <typename scalar_t>
+deviceforcuda scalar_t standard_gamma_grad_one(scalar_t alpha, scalar_t x) {


aten/src/ATen/native/cuda/Distributions.cu

+Tensor _s_gamma_cuda(const Tensor& alpha, Generator* gen) {
+  Tensor ret = alpha.type().tensor(alpha.sizes());
+  auto alpha_ = alpha.toType(ScalarType::Float);
+  AT_DISPATCH_FLOATING_TYPES(ret.type(), "gamma", [&] {


ezyang · 2018-04-24T14:04:42Z

@pytorchbot retest this please

t-vi · 2018-04-24T20:08:49Z

So while looking into adding an increment parameter to next_philox_seed.
In terms of thread safety: I think there is a race condition that you might end up having two threads grab the same state and then do an atomic add each (in other words, it would be correct to have the two assignments in next_philox_seed in one atomic operation). Is that rare enough to neglect given the overhead of having to lock?

ngimel · 2018-04-24T20:54:57Z

@t-vi but that's what is done now, AtomicAdd returns "old" value that is used by a thread. It is also thread-safe (i.e. two threads are guaranteed to have two different "old" values). Am I missing something?

t-vi · 2018-04-24T21:29:38Z

Ah yes. I was confused. Thank you!

…, cast locally rather than tensors

apaszke

LGTM. Needs some final fixes and should be good to merge.

aten/src/ATen/native/cuda/Distributions.cu

-  AT_DISPATCH_FLOATING_TYPES(ret.type(), "poisson", [&] {
-     poisson_cuda_kernel<scalar_t>(ret, lambda_, next_philox_seed(gen));
+  AT_DISPATCH_FLOATING_TYPES_AND_HALF(ret.type(), "poisson", [&] {
+    poisson_cuda_kernel<cuda::type<scalar_t>>(ret, lambda, next_philox_seed(gen, 20));


aten/src/ATen/native/Distributions.cpp

+        BaseSampler<scalar_t> standard_normal([generator] () {
+          return THRandom_normal(generator, 0.0, 1.0);
+        });
+        auto sample = sample_gamma<scalar_t, scalar_t>(alpha, standard_uniform, standard_normal);


aten/src/TH/generic/THTensorRandom.cpp


-void THTensor_(standard_gamma)(THTensor *self, THGenerator *_generator, THTensor *alpha)
-{
-  std::lock_guard<std::mutex> lock(_generator->mutex);


aten/src/ATen/native/Distributions.h

+#if (defined(__CUDACC_VER_MAJOR__) && (__CUDACC_MAJOR_VER < 9))
+template<typename R, typename T>
+deviceforcuda R cast_wrapper(T v) { return scalar_cast<R>(v); }
+#else


… double\nThank you for your review comments!

* Refactor standard_gamma and implement CUDA gamma sampling * Attempt fixes for AT_CUDA_ENABLED changes * Gamma cuda and cpu forward as ATen native * implement standard_gamma_grad_cuda * update native_test.cpp, try to fix windows and various cuda version compiles * searching a windows fix via CI... use std:: for math * casting some constants in the calculation, compute at float for half precision * whitespace fixes * add acctype to do half->float computation, include HALF in generation, cast locally rather than tensors * fix cuda8 half compilation * always use scalar_cast with CUDACC, lock CPU generator, CPU acctype = double\nThank you for your review comments!

t-vi requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners April 23, 2018 08:04

rachtsingh and others added 4 commits April 23, 2018 11:01

Refactor standard_gamma and implement CUDA gamma sampling

7985b5e

Attempt fixes for AT_CUDA_ENABLED changes

44c1602

Gamma cuda and cpu forward as ATen native

69dac34

implement standard_gamma_grad_cuda

345fee4

t-vi force-pushed the gamma_cuda branch from 9f8823e to 345fee4 Compare April 23, 2018 09:01

t-vi added 3 commits April 23, 2018 12:25

update native_test.cpp, try to fix windows and various cuda version c…

5bd2aca

…ompiles

searching a windows fix via CI... use std:: for math

21b8da8

casting some constants in the calculation, compute at float for half …

a3cca82

…precision

apaszke approved these changes Apr 23, 2018

View reviewed changes

whitespace fixes

5b32203

fritzo reviewed Apr 23, 2018

View reviewed changes

ngimel reviewed Apr 24, 2018

View reviewed changes

t-vi force-pushed the gamma_cuda branch from 0fee329 to 075c528 Compare April 24, 2018 19:52

t-vi force-pushed the gamma_cuda branch from 075c528 to 15eeace Compare April 24, 2018 21:28

t-vi force-pushed the gamma_cuda branch from 15eeace to 585efab Compare April 24, 2018 21:44

add acctype to do half->float computation, include HALF in generation…

e185707

…, cast locally rather than tensors

t-vi force-pushed the gamma_cuda branch from 585efab to e185707 Compare April 24, 2018 22:04

apaszke reviewed Apr 25, 2018

View reviewed changes

fix cuda8 half compilation

82d430a

ngimel reviewed Apr 25, 2018

View reviewed changes

always use scalar_cast with CUDACC, lock CPU generator, CPU acctype =…

c225a92

… double\nThank you for your review comments!

apaszke mentioned this pull request Apr 25, 2018

Segmentation fault with GPU expressions #6623

Closed

ezyang merged commit c10da63 into pytorch:master Apr 26, 2018

t-vi mentioned this pull request May 2, 2019

RFC: accscalar_t for float on CPU #20053

Closed

ezyang added open source labels Jun 24, 2019

implement gamma cuda #6855

implement gamma cuda #6855

Uh oh!

Conversation

t-vi commented Apr 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

t-vi commented Apr 23, 2018

Uh oh!

fritzo left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

rachtsingh commented Apr 24, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang commented Apr 24, 2018

Uh oh!

t-vi commented Apr 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

t-vi commented Apr 23, 2018 •

edited

Loading

t-vi commented Apr 24, 2018 •

edited

Loading