Fix x.pow(0) gradient when x contains 0 #8945

vishwakftw · 2018-06-27T17:26:03Z

This closes #8940 .

tools/autograd/templates/Functions.cpp

+
+Tensor pow_backward_self(Tensor grad, const Tensor & self, const Tensor & exponent) {
+  Tensor zero_mask = (exponent == 0.0);
+  return at::where(zero_mask, zeros_like(self), grad * exponent * self.pow(exponent - 1));


ssnl

need test in test_autograd.py

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

vadimkantorov · 2018-06-27T22:04:43Z

Does there exist a at:where overload that can accept a float tensor instead of binary tensor? If it exists, then the mask allocation can also be eliminated (if needed).

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

vishwakftw · 2018-06-28T13:01:20Z

@pytorchbot retest this please

a

ezyang · 2018-06-28T13:27:58Z

@vadimkantorov Unfortunately not. This might be a good addition (and pretty easy to add), if there are other cases where we might need it. Maybe we should generalize this into some sort of arbitrary equality test against a floating point number? (CC @colesbury @cpuhrsch for more opinions).

I do have a question beyond what this patch does: when the exponent is close to zero (but not exactly zero), what happens to the gradients? Is it numerically stable? If it's not, it would be nice (though not strictly necessary) to fix that too.

vishwakftw · 2018-06-28T13:35:19Z

@ezyang As the exponent tends to 0, the derivative is 0 for non-zero x, and tends to inf for x = 0. This works correctly in PyTorch:

>>> a
tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], requires_grad=True)
>>> a.pow(0.0001).sum().backward()
>>> a.grad
tensor([   inf, 0.0003, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0000, 0.0000,
        0.0000])
>>> a.pow(0.00001).sum().backward()
>>> a.grad
tensor([   inf, 0.0003, 0.0002, 0.0001, 0.0001, 0.0001, 0.0001, 0.0000, 0.0000,
        0.0000])
>>> a.pow(0.01).sum().backward()
>>> a.grad
tensor([   inf, 0.0103, 0.0052, 0.0035, 0.0026, 0.0021, 0.0017, 0.0015, 0.0013,
        0.0012])

vadimkantorov · 2018-06-28T13:52:55Z

@ezyang Thinking more about it, merging a comparison binary op and torch.where goes along the op fusion line (maybe it's a good case for the automatic fuser?). Comparison + torch.where is a frequent use-case I guess, but surfacing all comparison ops through where may bring unjustified api complexity.

The zero special case here would work because of a lucky coincidence: exponent needs to be compared to zero.

ezyang · 2018-06-29T00:22:24Z

@vadimkantorov Certainly, "where" would be easy to support in the JIT fuser. I'd also be OK with a special case just for the zero test. Up to you guys!

vishwakftw · 2018-06-29T00:39:24Z

I think for the purposes of this PR, the usage of where can remain as designed at the moment. Probably, when the semantics of where change, this part of the code can be revisited and modified accordingly. What do you think @ezyang ?

ezyang · 2018-06-29T00:49:54Z

Works for me.

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: This closes pytorch#8940 . Closes pytorch#8945 Differential Revision: D8668853 Pulled By: ezyang fbshipit-source-id: 80a629352ee2f506c38a05647b769281579a5af7

Fix x.pow(0) gradient when x contains 0

641ac38

vishwakftw requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners June 27, 2018 17:26

vadimkantorov reviewed Jun 27, 2018

View reviewed changes

vishwakftw added 3 commits June 27, 2018 14:35

Reduce allocations

e79bf25

Fix error

7e5e32d

Fix error

15c0669

ssnl previously requested changes Jun 27, 2018

View reviewed changes

facebook-github-bot reviewed Jun 27, 2018

View reviewed changes

vishwakftw added 2 commits June 27, 2018 18:35

Add tests for the fix

cb45292

fix nit; randn --> zeros

d3dc77b

facebook-github-bot reviewed Jun 28, 2018

View reviewed changes

ezyang approved these changes Jun 28, 2018

View reviewed changes

Merge branch 'master' into pow-0-derivative

fb97fae

facebook-github-bot reviewed Jun 29, 2018

View reviewed changes

facebook-github-bot closed this in b795620 Jun 29, 2018

vishwakftw deleted the pow-0-derivative branch June 29, 2018 13:55

vadimkantorov mentioned this pull request Jul 7, 2018

Fix standard deviation gradient #9238

Closed

ezyang added the open source label Jun 24, 2019

Fix x.pow(0) gradient when x contains 0 #8945

Fix x.pow(0) gradient when x contains 0 #8945

Uh oh!

Conversation

vishwakftw commented Jun 27, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ssnl left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

vadimkantorov commented Jun 27, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

vishwakftw commented Jun 28, 2018

Uh oh!

ezyang commented Jun 28, 2018

Uh oh!

vishwakftw commented Jun 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadimkantorov commented Jun 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Jun 29, 2018

Uh oh!

vishwakftw commented Jun 29, 2018

Uh oh!

ezyang commented Jun 29, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vishwakftw commented Jun 28, 2018 •

edited

Loading

vadimkantorov commented Jun 28, 2018 •

edited

Loading