Solving the under/overflow for complex division #92539

mfkasim1 · 2023-01-18T12:48:41Z

Fixes #92043.
I'm following numpy's implementation as suggested by @min-jean-cho.
I found out that this implementation still produces overflow if we're working with numbers greater than finfo.max / 2, but this is still much better than the previous implementation where it gets overflow with numbers greater than finfo.max ** 0.5.

pytorch-bot · 2023-01-18T12:48:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92539

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 6 Pending

As of commit 01e738b:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/test_binary_ufuncs.py

mfkasim1 · 2023-01-19T14:13:57Z

I found a very strange behaviour regarding the error on this PR. The tests are passed for complex64, but it fails on complex128 although I did no special treatment for complex64. Further investigation shows a very strange behaviour below:

import torch

device = torch.device("cpu")
dtype = torch.complex128
finfo = torch.finfo(dtype)
nom = torch.tensor([complex(finfo.min / 2, finfo.min / 2),
                    complex(finfo.max / 2, finfo.max / 2),
                    complex(finfo.tiny, finfo.tiny),
                    complex(finfo.tiny, 0.0),
                    complex(0.0, 0.0)], dtype=dtype, device=device)
denom = torch.tensor([complex(finfo.min / 2, finfo.min / 2),
                      complex(finfo.max / 2, finfo.max / 2),
                      complex(finfo.tiny, finfo.tiny),
                      complex(0.0, finfo.tiny),
                      complex(finfo.tiny, finfo.tiny)], dtype=dtype, device=device)

print(nom[:1] / denom[:1])  # works fine
print(nom[:2] / denom[:2])  # works fine
print(nom[:3] / denom[:3])  # works fine
print(nom[:4] / denom[:4])  # WRONG: all nans!
print(nom[3:4] / denom[3:4])  # works fine
print(nom[2:4] / denom[2:4])  # works fine
print(nom[1:4] / denom[1:4])  # works fine
print(nom[1:5] / denom[1:5])  # WRONG: all nans

It seems like with complex128, the results are wrong if the length of the operands are 4 or more. Any idea on why this happens?
@min-jean-cho @lezcano

lezcano · 2023-01-19T14:44:46Z

That path is vectorised, so it used vectorised CPU operations. Have a look at how they are implemented within aten/src/ATen/cpu/vec. Fixing those while keeping a not-too-bad performance is going to be trickier though. If you want, add a test for tensors of length 1, and then submit a follow up PR fixing the vectorised path and extending the tests. That way we can keep the size of the PRs reasonably small.

mfkasim1 · 2023-01-19T16:49:17Z

Thanks, @lezcano!
Another alternative is to use hypot:

import torch

def div(lhs, rhs):
    # (a + i * b) / (c + i * d)
    a, b = lhs.real, lhs.imag
    c, d = rhs.real, rhs.imag
    inv_denom_sqrt = torch.hypot(c, d) ** (-1)
    a2 = a * inv_denom_sqrt
    b2 = b * inv_denom_sqrt
    c2 = c * inv_denom_sqrt
    d2 = d * inv_denom_sqrt
    real = a2 * c2 + b2 * d2
    imag = b2 * c2 - a2 * d2
    return real + 1j * imag

device = torch.device("cpu")
dtype = torch.complex128
finfo = torch.finfo(dtype)
nom = torch.tensor([complex(finfo.min / 2, finfo.min / 2),
                    complex(finfo.max / 2, finfo.max / 2),
                    complex(finfo.tiny, finfo.tiny),
                    complex(finfo.tiny, 0.0),
                    complex(0.0, 0.0)], dtype=dtype, device=device)
denom = torch.tensor([complex(finfo.min / 2, finfo.min / 2),
                      complex(finfo.max / 2, finfo.max / 2),
                      complex(finfo.tiny, finfo.tiny),
                      complex(0.0, finfo.tiny),
                      complex(finfo.tiny, finfo.tiny)], dtype=dtype, device=device)

div(nom, denom)  # works fine!

This way, we don't need to worry about the vectorization path. hypot is a pretty standard function and it's available in std::hypot. I also check there exist hypot for 256-vectorization (here). What do you think?

lezcano · 2023-01-19T16:58:28Z

You'd still need to fix the AVX2 and AVX512 implementations of div accordingly. And sure, you can use hypot there, that may be faster.
When you do that, it'd still be good to throw in some benchmarks for good measure.

lezcano · 2023-01-19T17:00:14Z

At any rate, I'd still suggest first merging this PR, and then fixing the vectorised path on a follow-up PR.

If what you are proposing is to implement div as a composite operation, I don't think that's going to cut it. This is too basic of a building block to be able to afford that perf-wise. We need to fix these issues in the actual vectorised implementation of this operation.

mfkasim1 · 2023-01-19T17:36:36Z

Thanks. I agree that perf might be a problem if we're using hypot for such a basic operation here.

mfkasim1 · 2023-01-20T11:34:46Z

@pytorchbot rebase

pytorchmergebot · 2023-01-20T11:36:44Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2023-01-20T11:36:50Z

Successfully rebased compldiv onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout compldiv && git pull --rebase)

lezcano

Cool! Looking forward for the vectorisation fix!

mfkasim1 · 2023-01-20T14:22:15Z

Thanks, @lezcano!

@pytorchbot merge

pytorchmergebot · 2023-01-20T14:23:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-20T14:24:01Z

Merge failed

Reason: 2 mandatory check(s) failed (Rule Core Reviewers). The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

lezcano · 2023-01-20T14:40:04Z

This PR already fixed some operations, so you can remove the xfails on those! Also, it seems to be failing in Windows, so we need to fix those.

…to compldiv

lezcano · 2023-01-25T12:05:59Z

There's still at least one xfail that needs to be removed (there's an "unexpected success" in a test) but otherwise this is ready to go!

torch/testing/_internal/common_methods_invocations.py

lezcano · 2023-01-25T12:25:03Z

@pytorchbot merge

pytorchmergebot · 2023-01-25T12:26:45Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-25T12:57:01Z

Merge failed

Reason: 1 mandatory check(s) failed (Rule Core Reviewers). The first few are:

pull / linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

mfkasim1 · 2023-01-25T14:31:04Z

It seems that the unexpected pass on softsign is only at mac and the other still failing. I suspect this might have something to do with the vectorization on the CPU.

pytorch/caffe2/operators/softsign_op.cc

Lines 10 to 17 in a6ac922

    
           template <> 
        
           template <typename T> 
        
           bool SoftsignFunctor<CPUContext>:: 
        
           operator()(const int N, const T* X, T* Y, CPUContext* /* context */) const { 
        
             ConstEigenVectorArrayMap<T> X_arr(X, N); 
        
             EigenVectorMap<T>(Y, N) = (T(1) + X_arr.abs()).inverse() * X_arr; 
        
             return true; 
        
           }

lezcano · 2023-01-25T15:06:52Z

That code you found is from caffe. I don't think that code is tested in CI.
So, it seems that it was passing on CUDA (and MPS I guess as it's not failing) but not on CPU.
As such, adding a device="cpu" in the xfail should do the trick.

mfkasim1 · 2023-01-25T15:19:18Z

It failed (i.e. producing nan) only if it's on CPU for non-mac (based on what I see here): https://hud.pytorch.org/pytorch/pytorch/pull/92539?sha=6effd2f1e5d6aa64aeb9dc1d729eb214cc52c592). So I guess also adding active_if=not IS_MACOS

lezcano · 2023-01-25T15:21:08Z

It also passes on CUDA. See https://github.com/pytorch/pytorch/actions/runs/4004960425/jobs/6876076243 (or see how there are no failing CUDA jobs when you removed the xfail).

lezcano · 2023-01-25T15:22:37Z

torch/testing/_internal/common_methods_invocations.py

+                         "test_reference_numerics_large", dtypes=(torch.complex64,), device_type='cpu',
+                         active_if=not IS_MACOS),),


I think that the active_if is not necessary, as in mac device_type == "mps" I believe. In any case, no need to change it for this PR. You can try removing it in the next PR if you feel like it, otherwise it's alright as well.

lezcano · 2023-01-25T15:23:19Z

@pytorchbot merge

pytorchmergebot · 2023-01-25T15:24:55Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-25T15:55:11Z

Merge failed

Reason: 1 mandatory check(s) failed (Rule Core Reviewers). The first few are:

pull / linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

lezcano · 2023-01-25T17:40:14Z

The test also seems to pass on CPU on windows lol
I'm sorry that this is happening. Let's just wait for the CI to finish to see if there's any other case when it may be passing, and let's add them to the active_if= not...

mfkasim1 · 2023-01-25T21:19:09Z

No need to be sorry. I appreciate the complexity of pytorch (testing on a lot of platforms) and the fact that complex division is not a rare function to be used. It seems only windows that has the unexpected success, so I guess active_if=not IS_MACOS and not IS_WINDOWS should work (I hope my logic is still intact).

Do you know about the other error? RuntimeError: test_jit_fuser_te failed! Received signal: SIGIOT

…to compldiv

lezcano · 2023-01-26T00:11:33Z

@pytorchbot merge

13th's a charm

pytorchmergebot · 2023-01-26T00:13:23Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

mfkasim1 · 2023-01-26T12:24:46Z

13th's a charm

it really is! Thank you, @lezcano! I’ll do the vectorisation next

Fixes #92043 and completing #92539 by implementing the vectorized more stable complex division. I implement this using the internal `abs_` function to avoid branching. I also re-implement the internal `abs_` to make it more stable. Pull Request resolved: #93277 Approved by: https://github.com/peterbell10, https://github.com/lezcano

Solving the under/overflow for complex division

8ad946d

mfkasim1 requested review from mruberry and ngimel as code owners January 18, 2023 12:48

Lintrunner

3044548

pytorchbot added the open source label Jan 18, 2023

min-jean-cho reviewed Jan 18, 2023

View reviewed changes

test/test_binary_ufuncs.py Outdated Show resolved Hide resolved

Temporary fix to the test

e2b02f4

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 20, 2023

mfkasim1 added 3 commits January 20, 2023 11:36

Solving the under/overflow for complex division

b68bd7e

Lintrunner

326d9bf

Temporary fix to the test

a8dab03

pytorchmergebot force-pushed the compldiv branch from e2b02f4 to a8dab03 Compare January 20, 2023 11:37

lezcano approved these changes Jan 20, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 20, 2023

mfkasim1 added 2 commits January 23, 2023 10:33

Merge branch 'viable/strict' of https://github.com/pytorch/pytorch in…

7b6b042

…to compldiv

Trying to fix build error in windows

e104e5d

lezcano approved these changes Jan 25, 2023

View reviewed changes

Not expecting failures on test_reference_numerics_large_softsign

e59ee98

lezcano reviewed Jan 25, 2023

View reviewed changes

torch/testing/_internal/common_methods_invocations.py Outdated Show resolved Hide resolved

Removed xfail

da72cba

Restore test xfail on cpu for non-mac

c948b61

lezcano reviewed Jan 25, 2023

View reviewed changes

mfkasim1 added 2 commits January 25, 2023 21:19

Merge branch 'viable/strict' of https://github.com/pytorch/pytorch in…

41ede2a

…to compldiv

Windows expected success

01e738b

pytorchmergebot added the Merged label Jan 26, 2023

pytorchmergebot closed this in 1f55f3b Jan 26, 2023

mfkasim1 mentioned this pull request Jan 30, 2023

Vectorized more stable complex division #93277

Closed

lezcano mentioned this pull request Feb 15, 2023

A bug in complex number calculation #94894

Closed

		"test_reference_numerics_large", dtypes=(torch.complex64,), device_type='cpu',
		active_if=not IS_MACOS),),

Solving the under/overflow for complex division #92539

Solving the under/overflow for complex division #92539

Uh oh!

Conversation

mfkasim1 commented Jan 18, 2023

Uh oh!

pytorch-bot bot commented Jan 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92539

⏳ No Failures, 6 Pending

Uh oh!

Uh oh!

mfkasim1 commented Jan 19, 2023

Uh oh!

lezcano commented Jan 19, 2023

Uh oh!

mfkasim1 commented Jan 19, 2023

Uh oh!

lezcano commented Jan 19, 2023

Uh oh!

lezcano commented Jan 19, 2023

Uh oh!

mfkasim1 commented Jan 19, 2023

Uh oh!

mfkasim1 commented Jan 20, 2023

Uh oh!

pytorchmergebot commented Jan 20, 2023

Uh oh!

pytorchmergebot commented Jan 20, 2023

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

mfkasim1 commented Jan 20, 2023

Uh oh!

pytorchmergebot commented Jan 20, 2023

Merge started

Uh oh!

pytorchmergebot commented Jan 20, 2023

Merge failed

Uh oh!

lezcano commented Jan 20, 2023

Uh oh!

lezcano commented Jan 25, 2023

Uh oh!

Uh oh!

lezcano commented Jan 25, 2023

Uh oh!

pytorchmergebot commented Jan 25, 2023

Merge started

Uh oh!

pytorchmergebot commented Jan 25, 2023

Merge failed

Uh oh!

mfkasim1 commented Jan 25, 2023

Uh oh!

lezcano commented Jan 25, 2023

Uh oh!

mfkasim1 commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lezcano commented Jan 25, 2023

Uh oh!

lezcano Jan 25, 2023

Choose a reason for hiding this comment

Uh oh!

lezcano commented Jan 25, 2023

Uh oh!

pytorchmergebot commented Jan 25, 2023

Merge started

Uh oh!

pytorchmergebot commented Jan 25, 2023

Merge failed

Uh oh!

lezcano commented Jan 25, 2023

Uh oh!

mfkasim1 commented Jan 25, 2023

Uh oh!

lezcano commented Jan 26, 2023

pytorch-bot bot commented Jan 18, 2023 •

edited

Loading

mfkasim1 commented Jan 25, 2023 •

edited

Loading