Fix NaN gradients in atan2_backward when both inputs are zero#166787
Fix NaN gradients in atan2_backward when both inputs are zero#166787pushkar-hue wants to merge 5 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166787
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit d3b3829 with merge base c5d91d9 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "release notes: autograd" |
| auto recip = (self * self + other * other).reciprocal(); | ||
| auto denom = self * self + other * other; | ||
| auto recip = denom.reciprocal(); | ||
| recip = at::where(denom == 0, at::zeros_like(recip), recip); |
There was a problem hiding this comment.
In this case, I think the 2nd arg can be a scalar or scalar Tensor. No need to allocate a large zeros matrix.
Also wondering there is a way to make this where update inplace. Isn't it the same as 'recip[denom == 0] = 0' in python shorthand. Should be a way to do a selective scalar assignment in CPP explicitly if the shorthand doesn't work. Inplace where_ is also an option
There was a problem hiding this comment.
You are right, I don't need to create a Tensor second arg can just be scalar. Also, I looked into it and I found out equivalent of recip[denom == 0] = 0 would to do this in CPP recip.masked_fill_(denom == 0, 0) I am testing it out locally first if it works then I'll push the changes.
|
@Skylion007 I have pushed the requested changes you can take a look and let me know if there's anything I need to consider. |
|
@Skylion007 I'm looking at the two failing CI checks:
I believe both failures are unrelated to my code changes. Could you please confirm? Thanks! |
|
@pytorchbot rebase |
Yeah failures look unrelated |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
ca563e3 to
d83a900
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) Details for Dev Infra teamRaised by workflow job |
|
Failures look real, and it solution seems to be needing a bit more complexity: you don't want to in-place masked_fill_ always because masked_fill_ is not supported by NestedTensor, and in the higher-order gradients case you don't want to mutate saved tensors. |
|
does my older solution which avoids in place update of or maybe I can do something like this let me know if this fix the issue or I am missing something @soulitzer |
|
nvm I tested it locally and the only solution that seems to work is just this I'm not sure how efficient is this out of place update of recipe, but this is the only solution that seems to work I may have been mistaken. |
|
Maybe try Let's avoid keeping many temporary tensors around at once |
|
Thanks @soulitzer! Your suggestion worked perfectly. I have pushed the changes it passed all the local test. |
a900e96 to
d3b3829
Compare
|
sorry to force push it was getting some lint error so i had to rebase locally and resolve some merge conflicts to solve the lint error this should pass now hopefully |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…h#166787) Fixes pytorch#165427 ## Description of Bug 🐛 As reported in pytorch#165427, When both the input of `atan2` function is zero the gradient becomes `NaN`. During the forward pass, `atan2` successfully avoids division-by-zero issue, but during backpropagation gradients become `NaN`. This is because the backward pass calculates `(self * self + other * other).reciprocal()`, which becomes `inf` at `(0, 0)`. The subsequent multiplication by zero `(0 * inf)` results in `NaN`. ## Changes - Added an `at::where` condition to handle zero denominators in `atan2_backward`. - If denom is zero return 0 for the reciprocal; otherwise, use the original value. ## Testing - Added` test_atan2_zero_gradient` in `test/test_autograd.py` to verify `atan2` returns `0.0` gradients for `(0,0)`. Pull Request resolved: pytorch#166787 Approved by: https://github.com/soulitzer
…h#166787) Fixes pytorch#165427 ## Description of Bug 🐛 As reported in pytorch#165427, When both the input of `atan2` function is zero the gradient becomes `NaN`. During the forward pass, `atan2` successfully avoids division-by-zero issue, but during backpropagation gradients become `NaN`. This is because the backward pass calculates `(self * self + other * other).reciprocal()`, which becomes `inf` at `(0, 0)`. The subsequent multiplication by zero `(0 * inf)` results in `NaN`. ## Changes - Added an `at::where` condition to handle zero denominators in `atan2_backward`. - If denom is zero return 0 for the reciprocal; otherwise, use the original value. ## Testing - Added` test_atan2_zero_gradient` in `test/test_autograd.py` to verify `atan2` returns `0.0` gradients for `(0,0)`. Pull Request resolved: pytorch#166787 Approved by: https://github.com/soulitzer
Fixes #165427
Description of Bug 🐛
As reported in #165427, When both the input of
atan2function is zero the gradient becomesNaN. During the forward pass,atan2successfully avoids division-by-zero issue, but during backpropagation gradients becomeNaN.This is because the backward pass calculates
(self * self + other * other).reciprocal(), which becomesinfat(0, 0). The subsequent multiplication by zero(0 * inf)results inNaN.Changes
at::wherecondition to handle zero denominators inatan2_backward.Testing
test_atan2_zero_gradientintest/test_autograd.pyto verifyatan2returns0.0gradients for(0,0).cc: @soulitzer