Support CPU Apply in ATen and implement standard_gamma using it #4161

gchanan · 2017-12-13T23:13:49Z

Main changes in this PR:

Added a TH_APPLY-style templatized function for CPU apply calls (currently only 2 and 3 tensor argument versions are supported, but more are easy to add). In fact, this is basically identical to TH_APPLY, except it uses ATen functions and the API is a template instead of a macro. The template takes an operation that is performed on the data (and an indicator to signal early termination); i.e. you don't need to know that x_data is a pointer to the current data location of x.
Refactors the ATen dispatch code to easily generate dispatch code for different subsets of the scalar types. This is in preference to the template_scalar path, which requires valid specialization of each scalar type. Valid specializations are particularly annoying with CUDA because you most likely can't put the specializations in a header so need to write some sort of for-all-scalar-type macro to get the correct specializations. Currently, we only generate dispatch_all (all scalar types, the equivalent existed already), and dispatch_cpu_floating_types (which is used by standard_gamma).
Implements standard_gamma using the above changes as a proof of concept (this is an arbitrary choice, it was the latest apply macro to be committed). The forward is bound via Declarations.yaml, the backward via the Apply template, and then they are hooked together in derivatives.yaml. This eliminates needing to change TH at all going forward, which means one can write idiomatic C++ instead of the TH-style macros (e.g. TH_MATH_NAME).

… it. Main changes in this PR: 1) Added a TH_APPLY-style templatized function for CPU apply calls (currently only 2 and 3 tensor argument versions are supported, but more are easy to add). In fact, this is basically identical to TH_APPLY, except it uses ATen functions and the API is a template instead of a macro. The template takes an operation that is performed on the data (and an indicator to signal early termination); i.e. you don't need to know that x_data is a pointer to the current data location of x. 2) Refactors the ATen dispatch code to easily generate dispatch code for different subsets of the scalar types. This is in preference to the template_scalar path, which requires valid specialization of each scalar type. Valid specializations are particularly annoying with CUDA because you most likely can't put the specializations in a header so need to write some sort of for-all-scalar-type macro to get the correct specializations. Currently, we only generate dispatch_all (all scalar types, the equivalent existed already), and dispatch_cpu_floating_types (which is used by standard_gamma). 3) Implements standard_gamma using the above changes (this is an arbitrary choice, it was the latest apply macro to be committed). The forward is bound via Declarations.yaml, the backward via the Apply template, and then they are hooked together in derivatives.yaml. This eliminates needing to change TH at all going forward, which means one can write idiomatic C++ instead of the TH-style macros (e.g. TH_MATH_NAME).

pytorchbot · 2017-12-13T23:13:50Z

@gchanan, thanks for your PR! We identified @zdevito to be a potential reviewer.

gchanan · 2017-12-13T23:14:59Z

I have a CUDA version implemented as well, but this seemed like a sensible place to split up the PR.

aten/src/ATen/native/NativeFunctions.cpp

+  const Type& the_type = self.type();
+  dispatch_cpu_floating_types<StandardGammaGradOp>(the_type, "_standard_gamma_grad", ret, self, alpha);
+  return ret;
+}


aten/src/ATen/Declarations.cwrap

+        - arg: THTensor* output
+          output: True
+        - arg: THGenerator* generator
+          default: THPGenerator_TH_CData(THPDefaultGenerator)


gchanan · 2017-12-14T16:04:55Z

@pytorchbot retest this please.

ezyang · 2017-12-14T17:35:56Z

@pytorchbot retest this please

ezyang · 2017-12-14T17:58:43Z

@pytorchbot retest this please

aten/src/ATen/native/NativeFunctions.cpp

+    ret_val = standard_gamma_grad_one(self_val, alpha_val);
+  }
+
+  static void apply(Tensor& ret, const Tensor& self, const Tensor& alpha) {


aten/src/ATen/native/NativeFunctions.cpp

+
+template <typename Scalar>
+struct StandardGammaGradOp {
+  void operator()(Scalar& ret_val, const Scalar& self_val, const Scalar &alpha_val, bool& early_exit)


aten/src/ATen/gen.py


 densities = ['Dense', 'Sparse']

+# scalar_name, c_type, accreal, th_scalar_type, is_floating_type


aten/src/ATen/CPUApplyUtils.h

+ * loops.
+ */
+
+static inline void check_correct_backend(const Tensor &t, unsigned int pos) {


aten/src/ATen/CPUApplyUtils.h

+  check_correct_backend(t3, 3);
+}
+
+#define __ATH_TENSOR_APPLYX_PREAMBLE(TYPE, ATENSOR, DIM, ALLOW_CONTIGUOUS) \


aten/src/ATen/CPUApplyUtils.h

+}
+
+template <typename ScalarType, typename Op>
+void CPU_tensor_apply3_dim(Tensor &tensor1, Tensor& tensor2, Tensor& tensor3, int64_t dim, Op op) {


zdevito

Awesome! We have apply in ATen! I listed some ways I think we can make the API simpler, let me know what you think.

aten/src/ATen/native/NativeFunctions.cpp

+
+  static void apply(Tensor& ret, const Tensor& self, const Tensor& alpha) {
+    StandardGammaGradOp<Scalar> op;
+    CPU_tensor_apply3<Scalar, StandardGammaGradOp<Scalar>>(ret, self, alpha, op);


aten/src/ATen/native/NativeFunctions.cpp

+
+Tensor _standard_gamma_grad(const Tensor& self, const Tensor& alpha) {
+  Tensor ret = self.type().tensor(self.sizes());
+  dispatch_cpu_floating_types<StandardGammaGradOp>(self.type(), "_standard_gamma_grad", ret, self, alpha);


gchanan · 2017-12-15T17:00:04Z

CC @fritzo you may be interested in this.

aten/src/ATen/CPUApplyUtils.h

+  CPU_tensor_apply2_dim<ScalarType, Op>(tensor1, tensor2, -1, op);
+}
+
+template <typename ScalarType, typename Op>


aten/src/ATen/CPUApplyUtils.h

+    __ATH_TENSOR_APPLYX_UPDATE_COUNTERS(tensor3, 0)
+  }
+  if(tensor1_counter != NULL)
+    delete [] tensor1_counter;


aten/src/ATen/native/NativeFunctions.cpp

+/** Computes the reparameterized gradient -(d/dalpha cdf(x;alpha)) / pdf(x;alpha)
+    for random number x drawn from a standard Gamma distribution Gamma(alpha).
+*/
+template <typename Scalar>


aten/src/ATen/native/NativeFunctions.cpp

+
+// TODO Replace this with more accurate digamma().
+template <typename Scalar>
+static inline Scalar digamma_one(Scalar x) {


aten/src/ATen/native/NativeFunctions.cpp

+
+template <typename Scalar>
+struct StandardGammaGradOp {
+  void operator()(Scalar& ret_val, const Scalar& self_val, const Scalar &alpha_val, bool& early_exit)


tools/autograd/derivatives.yaml

 - name: zeros  # fallthrough

+- name: _standard_gamma(Tensor self, Generator generator)
+  self: grad * output._standard_gamma_grad(self)


torch/distributions/gamma.py

    if not isinstance(alpha, Variable):
        return torch._C._standard_gamma(alpha)
-    return _StandardGamma.apply(alpha)
+    return alpha._standard_gamma()


fritzo · 2017-12-15T17:29:07Z

@gchanan Thanks, maybe we can use this for standard_gamma_grad(). I might also try incorporating it into dirichlet_grad() in #4117.

gchanan · 2017-12-15T22:23:23Z

On early_exit (I can't comment directly because I've changed that code): to be fair, it's not totally exposing serial semantics; the apply function could enforce it in a thread safe way (it would just be a suggested early exit at that point). But I'll just get rid of it for now, since we aren't going to implement something like that in this PR.

colesbury

lgtm

aten/src/ATen/native/NativeFunctions.cpp

+/** Computes the reparameterized gradient -(d/dalpha cdf(x;alpha)) / pdf(x;alpha)
+    for random number x drawn from a standard Gamma distribution Gamma(alpha).
+*/
+template <typename CScalar>


gchanan · 2017-12-18T20:44:58Z

Test failure doesn't look related; I'm going to merge this because I've run into a bunch of merge conflicts over the last few days.

I think the only review comment left is from @zdevito, "(2) We use templated functions rather the classes. The reason for the functions was partial specialization. If you need partial specialization, then just write your own switch statement"; lmk if you want changes to this and I'll make them in a future commit.

alicanb · 2017-12-22T18:19:08Z

Guys, @fritzo and I think this breaks test_gamma_sample_grad in test_distributions. (RuntimeError: VariableType::_standard_gamma_grad NYI)

gchanan · 2017-12-22T18:56:01Z

@alicanb I'll take a look.

fritzo · 2018-01-09T21:43:12Z

@gchanan This PR removes _standard_gamma_grad() from the torch._C module. How can I access this function in Python now, e.g. for unit testing? Thanks!

gchanan · 2018-01-09T21:45:11Z

On tensors or variables?

fritzo · 2018-01-09T21:45:37Z

On tensors. I see it is a Variable method now?

fritzo · 2018-01-09T21:46:24Z

Well that works, I'll just use it on Variables. Thanks.

gchanan · 2018-01-09T21:49:49Z

Yah, since we are planning on merging Variables and tensors I didn't spend the extra effort to make it available on tensors (and it wasn't being used on tensors anyway).

Generate Dispatch code with nicer spacing.

2e51f15

apaszke reviewed Dec 14, 2017

View reviewed changes

fmassa mentioned this pull request Dec 14, 2017

[Done]parallelize elementwise operation with openmp #2764

Merged

Small cleanups.

66e7ea0

Fix typo.

6fe366f

ezyang reviewed Dec 15, 2017

View reviewed changes

aten/src/ATen/native/NativeFunctions.cpp Outdated

ret_val = standard_gamma_grad_one(self_val, alpha_val);

}

static void apply(Tensor& ret, const Tensor& self, const Tensor& alpha) {

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Dec 15, 2017

View reviewed changes

aten/src/ATen/gen.py

densities = ['Dense', 'Sparse']

# scalar_name, c_type, accreal, th_scalar_type, is_floating_type

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Dec 15, 2017

View reviewed changes

aten/src/ATen/CPUApplyUtils.h

* loops.

*/

static inline void check_correct_backend(const Tensor &t, unsigned int pos) {

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Dec 15, 2017

View reviewed changes

aten/src/ATen/CPUApplyUtils.h

check_correct_backend(t3, 3);

}

#define __ATH_TENSOR_APPLYX_PREAMBLE(TYPE, ATENSOR, DIM, ALLOW_CONTIGUOUS) \

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Dec 15, 2017

View reviewed changes

aten/src/ATen/CPUApplyUtils.h

}

template <typename ScalarType, typename Op>

void CPU_tensor_apply3_dim(Tensor &tensor1, Tensor& tensor2, Tensor& tensor3, int64_t dim, Op op) {

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

zdevito reviewed Dec 15, 2017

View reviewed changes

colesbury reviewed Dec 15, 2017

View reviewed changes

gchanan mentioned this pull request Dec 15, 2017

Remove THPGenerator default code for random functions in Declarations.cwrap #4202

Closed

gchanan added 2 commits December 15, 2017 13:41

Add TODOs for changing macros, remove dead code.

34e7fd0

Use a lambda function.

97456ef

gchanan added 5 commits December 15, 2017 14:27

Get rid of early exit.

662875a

Rename Scalar,ScalarType template parameters to CScalar.

4c6a771

Reorder _standard_gamma_grad parameters.

2fd52c3

Merge remote-tracking branch 'origin/master' into aten_apply_cpu_master4

4b3dd7c

Add comments explaining calling convention.

df5fde4

gchanan added 3 commits December 15, 2017 15:50

Don't generate Dispatch.h anymore.

58171f8

Get rid of backend specific checks in dispatch.

3e0fd26

Merge remote-tracking branch 'origin/master' into aten_apply_cpu_master5

d6d6889

onnxbot mentioned this pull request Dec 18, 2017

[auto] pytorch-pr-4161 onnxbot/onnx-fb-universe#64

Closed

gchanan added 2 commits December 18, 2017 09:46

Fix empty/scalar check.

86ca534

Merge remote-tracking branch 'origin/master' into aten_apply_cpu_master5

196bdb8

colesbury approved these changes Dec 18, 2017

View reviewed changes

gchanan merged commit 0876bab into pytorch:master Dec 18, 2017


		densities = ['Dense', 'Sparse']

		# scalar_name, c_type, accreal, th_scalar_type, is_floating_type

Support CPU Apply in ATen and implement standard_gamma using it #4161

Support CPU Apply in ATen and implement standard_gamma using it #4161

Uh oh!

Conversation

gchanan commented Dec 13, 2017

Uh oh!

pytorchbot commented Dec 13, 2017

Uh oh!

gchanan commented Dec 13, 2017

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

gchanan commented Dec 14, 2017

Uh oh!

ezyang commented Dec 14, 2017

Uh oh!

ezyang commented Dec 14, 2017

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

zdevito left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

gchanan commented Dec 15, 2017

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.