-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Support CPU Apply in ATen and implement standard_gamma using it #4161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… it. Main changes in this PR: 1) Added a TH_APPLY-style templatized function for CPU apply calls (currently only 2 and 3 tensor argument versions are supported, but more are easy to add). In fact, this is basically identical to TH_APPLY, except it uses ATen functions and the API is a template instead of a macro. The template takes an operation that is performed on the data (and an indicator to signal early termination); i.e. you don't need to know that x_data is a pointer to the current data location of x. 2) Refactors the ATen dispatch code to easily generate dispatch code for different subsets of the scalar types. This is in preference to the template_scalar path, which requires valid specialization of each scalar type. Valid specializations are particularly annoying with CUDA because you most likely can't put the specializations in a header so need to write some sort of for-all-scalar-type macro to get the correct specializations. Currently, we only generate dispatch_all (all scalar types, the equivalent existed already), and dispatch_cpu_floating_types (which is used by standard_gamma). 3) Implements standard_gamma using the above changes (this is an arbitrary choice, it was the latest apply macro to be committed). The forward is bound via Declarations.yaml, the backward via the Apply template, and then they are hooked together in derivatives.yaml. This eliminates needing to change TH at all going forward, which means one can write idiomatic C++ instead of the TH-style macros (e.g. TH_MATH_NAME).
|
I have a CUDA version implemented as well, but this seemed like a sensible place to split up the PR. |
| const Type& the_type = self.type(); | ||
| dispatch_cpu_floating_types<StandardGammaGradOp>(the_type, "_standard_gamma_grad", ret, self, alpha); | ||
| return ret; | ||
| } |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| - arg: THTensor* output | ||
| output: True | ||
| - arg: THGenerator* generator | ||
| default: THPGenerator_TH_CData(THPDefaultGenerator) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@pytorchbot retest this please. |
|
@pytorchbot retest this please |
|
@pytorchbot retest this please |
| ret_val = standard_gamma_grad_one(self_val, alpha_val); | ||
| } | ||
|
|
||
| static void apply(Tensor& ret, const Tensor& self, const Tensor& alpha) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| template <typename Scalar> | ||
| struct StandardGammaGradOp { | ||
| void operator()(Scalar& ret_val, const Scalar& self_val, const Scalar &alpha_val, bool& early_exit) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| densities = ['Dense', 'Sparse'] | ||
|
|
||
| # scalar_name, c_type, accreal, th_scalar_type, is_floating_type |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| * loops. | ||
| */ | ||
|
|
||
| static inline void check_correct_backend(const Tensor &t, unsigned int pos) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| check_correct_backend(t3, 3); | ||
| } | ||
|
|
||
| #define __ATH_TENSOR_APPLYX_PREAMBLE(TYPE, ATENSOR, DIM, ALLOW_CONTIGUOUS) \ |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| } | ||
|
|
||
| template <typename ScalarType, typename Op> | ||
| void CPU_tensor_apply3_dim(Tensor &tensor1, Tensor& tensor2, Tensor& tensor3, int64_t dim, Op op) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
zdevito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! We have apply in ATen! I listed some ways I think we can make the API simpler, let me know what you think.
|
|
||
| static void apply(Tensor& ret, const Tensor& self, const Tensor& alpha) { | ||
| StandardGammaGradOp<Scalar> op; | ||
| CPU_tensor_apply3<Scalar, StandardGammaGradOp<Scalar>>(ret, self, alpha, op); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| Tensor _standard_gamma_grad(const Tensor& self, const Tensor& alpha) { | ||
| Tensor ret = self.type().tensor(self.sizes()); | ||
| dispatch_cpu_floating_types<StandardGammaGradOp>(self.type(), "_standard_gamma_grad", ret, self, alpha); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
CC @fritzo you may be interested in this. |
aten/src/ATen/CPUApplyUtils.h
Outdated
| CPU_tensor_apply2_dim<ScalarType, Op>(tensor1, tensor2, -1, op); | ||
| } | ||
|
|
||
| template <typename ScalarType, typename Op> |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| __ATH_TENSOR_APPLYX_UPDATE_COUNTERS(tensor3, 0) | ||
| } | ||
| if(tensor1_counter != NULL) | ||
| delete [] tensor1_counter; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| /** Computes the reparameterized gradient -(d/dalpha cdf(x;alpha)) / pdf(x;alpha) | ||
| for random number x drawn from a standard Gamma distribution Gamma(alpha). | ||
| */ | ||
| template <typename Scalar> |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| // TODO Replace this with more accurate digamma(). | ||
| template <typename Scalar> | ||
| static inline Scalar digamma_one(Scalar x) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| template <typename Scalar> | ||
| struct StandardGammaGradOp { | ||
| void operator()(Scalar& ret_val, const Scalar& self_val, const Scalar &alpha_val, bool& early_exit) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
tools/autograd/derivatives.yaml
Outdated
| - name: zeros # fallthrough | ||
|
|
||
| - name: _standard_gamma(Tensor self, Generator generator) | ||
| self: grad * output._standard_gamma_grad(self) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| if not isinstance(alpha, Variable): | ||
| return torch._C._standard_gamma(alpha) | ||
| return _StandardGamma.apply(alpha) | ||
| return alpha._standard_gamma() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
On early_exit (I can't comment directly because I've changed that code): to be fair, it's not totally exposing serial semantics; the apply function could enforce it in a thread safe way (it would just be a suggested early exit at that point). But I'll just get rid of it for now, since we aren't going to implement something like that in this PR. |
colesbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| /** Computes the reparameterized gradient -(d/dalpha cdf(x;alpha)) / pdf(x;alpha) | ||
| for random number x drawn from a standard Gamma distribution Gamma(alpha). | ||
| */ | ||
| template <typename CScalar> |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
Test failure doesn't look related; I'm going to merge this because I've run into a bunch of merge conflicts over the last few days. I think the only review comment left is from @zdevito, "(2) We use templated functions rather the classes. The reason for the functions was partial specialization. If you need partial specialization, then just write your own switch statement"; lmk if you want changes to this and I'll make them in a future commit. |
|
Guys, @fritzo and I think this breaks |
|
@alicanb I'll take a look. |
|
@gchanan This PR removes |
|
On tensors or variables? |
|
On tensors. I see it is a Variable method now? |
|
Well that works, I'll just use it on Variables. Thanks. |
|
Yah, since we are planning on merging Variables and tensors I didn't spend the extra effort to make it available on tensors (and it wasn't being used on tensors anyway). |
Main changes in this PR:
Added a TH_APPLY-style templatized function for CPU apply calls (currently only 2 and 3 tensor argument versions are supported, but more are easy to add). In fact, this is basically identical to TH_APPLY, except it uses ATen functions and the API is a template instead of a macro. The template takes an operation that is performed on the data (and an indicator to signal early termination); i.e. you don't need to know that x_data is a pointer to the current data location of x.
Refactors the ATen dispatch code to easily generate dispatch code for different subsets of the scalar types. This is in preference to the template_scalar path, which requires valid specialization of each scalar type. Valid specializations are particularly annoying with CUDA because you most likely can't put the specializations in a header so need to write some sort of for-all-scalar-type macro to get the correct specializations. Currently, we only generate dispatch_all (all scalar types, the equivalent existed already), and dispatch_cpu_floating_types (which is used by standard_gamma).
Implements standard_gamma using the above changes as a proof of concept (this is an arbitrary choice, it was the latest apply macro to be committed). The forward is bound via Declarations.yaml, the backward via the Apply template, and then they are hooked together in derivatives.yaml. This eliminates needing to change TH at all going forward, which means one can write idiomatic C++ instead of the TH-style macros (e.g. TH_MATH_NAME).