-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Solving the under/overflow for complex division #92539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92539
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 6 PendingAs of commit 01e738b: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
I found a very strange behaviour regarding the error on this PR. The tests are passed for import torch
device = torch.device("cpu")
dtype = torch.complex128
finfo = torch.finfo(dtype)
nom = torch.tensor([complex(finfo.min / 2, finfo.min / 2),
complex(finfo.max / 2, finfo.max / 2),
complex(finfo.tiny, finfo.tiny),
complex(finfo.tiny, 0.0),
complex(0.0, 0.0)], dtype=dtype, device=device)
denom = torch.tensor([complex(finfo.min / 2, finfo.min / 2),
complex(finfo.max / 2, finfo.max / 2),
complex(finfo.tiny, finfo.tiny),
complex(0.0, finfo.tiny),
complex(finfo.tiny, finfo.tiny)], dtype=dtype, device=device)
print(nom[:1] / denom[:1]) # works fine
print(nom[:2] / denom[:2]) # works fine
print(nom[:3] / denom[:3]) # works fine
print(nom[:4] / denom[:4]) # WRONG: all nans!
print(nom[3:4] / denom[3:4]) # works fine
print(nom[2:4] / denom[2:4]) # works fine
print(nom[1:4] / denom[1:4]) # works fine
print(nom[1:5] / denom[1:5]) # WRONG: all nansIt seems like with |
|
That path is vectorised, so it used vectorised CPU operations. Have a look at how they are implemented within |
|
Thanks, @lezcano! import torch
def div(lhs, rhs):
# (a + i * b) / (c + i * d)
a, b = lhs.real, lhs.imag
c, d = rhs.real, rhs.imag
inv_denom_sqrt = torch.hypot(c, d) ** (-1)
a2 = a * inv_denom_sqrt
b2 = b * inv_denom_sqrt
c2 = c * inv_denom_sqrt
d2 = d * inv_denom_sqrt
real = a2 * c2 + b2 * d2
imag = b2 * c2 - a2 * d2
return real + 1j * imag
device = torch.device("cpu")
dtype = torch.complex128
finfo = torch.finfo(dtype)
nom = torch.tensor([complex(finfo.min / 2, finfo.min / 2),
complex(finfo.max / 2, finfo.max / 2),
complex(finfo.tiny, finfo.tiny),
complex(finfo.tiny, 0.0),
complex(0.0, 0.0)], dtype=dtype, device=device)
denom = torch.tensor([complex(finfo.min / 2, finfo.min / 2),
complex(finfo.max / 2, finfo.max / 2),
complex(finfo.tiny, finfo.tiny),
complex(0.0, finfo.tiny),
complex(finfo.tiny, finfo.tiny)], dtype=dtype, device=device)
div(nom, denom) # works fine!This way, we don't need to worry about the vectorization path. |
|
You'd still need to fix the AVX2 and AVX512 implementations of div accordingly. And sure, you can use hypot there, that may be faster. |
|
At any rate, I'd still suggest first merging this PR, and then fixing the vectorised path on a follow-up PR. If what you are proposing is to implement |
|
Thanks. I agree that perf might be a problem if we're using hypot for such a basic operation here. |
|
@pytorchbot rebase |
|
@pytorchbot successfully started a rebase job. Check the current status here |
|
Successfully rebased |
e2b02f4 to
a8dab03
Compare
lezcano
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Looking forward for the vectorisation fix!
|
Thanks, @lezcano! @pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 mandatory check(s) failed (Rule Dig deeper by viewing the failures on hud Details for Dev Infra teamRaised by workflow job |
|
This PR already fixed some operations, so you can remove the xfails on those! Also, it seems to be failing in Windows, so we need to fix those. |
|
There's still at least one xfail that needs to be removed (there's an "unexpected success" in a test) but otherwise this is ready to go! |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed (Rule Dig deeper by viewing the failures on hud Details for Dev Infra teamRaised by workflow job |
|
It seems that the unexpected pass on softsign is only at mac and the other still failing. I suspect this might have something to do with the vectorization on the CPU. pytorch/caffe2/operators/softsign_op.cc Lines 10 to 17 in a6ac922
|
|
That code you found is from caffe. I don't think that code is tested in CI. |
|
It failed (i.e. producing nan) only if it's on CPU for non-mac (based on what I see here): https://hud.pytorch.org/pytorch/pytorch/pull/92539?sha=6effd2f1e5d6aa64aeb9dc1d729eb214cc52c592). So I guess also adding |
|
It also passes on CUDA. See https://github.com/pytorch/pytorch/actions/runs/4004960425/jobs/6876076243 (or see how there are no failing CUDA jobs when you removed the xfail). |
| "test_reference_numerics_large", dtypes=(torch.complex64,), device_type='cpu', | ||
| active_if=not IS_MACOS),), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the active_if is not necessary, as in mac device_type == "mps" I believe. In any case, no need to change it for this PR. You can try removing it in the next PR if you feel like it, otherwise it's alright as well.
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed (Rule Dig deeper by viewing the failures on hud Details for Dev Infra teamRaised by workflow job |
|
The test also seems to pass on CPU on windows lol |
|
No need to be sorry. I appreciate the complexity of pytorch (testing on a lot of platforms) and the fact that complex division is not a rare function to be used. It seems only windows that has the unexpected success, so I guess Do you know about the other error? |
|
@pytorchbot merge 13th's a charm |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
it really is! Thank you, @lezcano! I’ll do the vectorisation next |
Fixes #92043 and completing #92539 by implementing the vectorized more stable complex division. I implement this using the internal `abs_` function to avoid branching. I also re-implement the internal `abs_` to make it more stable. Pull Request resolved: #93277 Approved by: https://github.com/peterbell10, https://github.com/lezcano
Fixes #92043.
I'm following numpy's implementation as suggested by @min-jean-cho.
I found out that this implementation still produces overflow if we're working with numbers greater than
finfo.max / 2, but this is still much better than the previous implementation where it gets overflow with numbers greater thanfinfo.max ** 0.5.