Add CPU/CUDA support to torch.logcumsumexp #90847

mfkasim1 · 2022-12-14T17:05:40Z

Another PR towards solving #89205.
What's in this PR:

The implementation of forward logcumsumexp for complex numbers in CPU & CUDA
The tests on forward call of logcumsumexp for complex numbers
The implementation of backward logcumsumexp for complex numbers

What's missing:

The test on backward gradient of logcumsumexp (it complaints RuntimeError: logcumsumexp does not support automatic differentiation for outputs with complex dtype. and I don't know how to solve the error and I don't know where to put the test for the backward computation). If possible, I'd like this to be done in this PR.

It's really tricky to handle the edge cases here (i.e. the ones involving inf), but I've tried my best to put some comments explaining the reasonings of my decisions in this PR.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2022-12-14T17:05:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90847

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 Failures

As of commit d28c203:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mfkasim1 · 2022-12-15T05:45:05Z

@albanD could you please help me how to enable this function to support automatic differentiation for complex dtype?
I keep getting the error: RuntimeError: logcumsumexp does not support automatic differentiation for outputs with complex dtype. and I've searched the files containing the word logcumsumexp but couldn't find any file that I might need to change?

albanD · 2022-12-26T20:40:07Z

Sure!
Just add it to

pytorch/tools/autograd/gen_variable_type.py

Line 171 in 0b255b3

GRADIENT_IMPLEMENTED_FOR_COMPLEX = {

This is still an allow-list as most ops are not supported and we want to be careful when new ops are added as it is easy to forget to test the complex case.

albanD · 2023-01-04T22:02:52Z

The errors seems to be precision issues with the gradients that you compute right?

mfkasim1 · 2023-01-05T15:03:41Z

@albanD yes, but it seems there are other unrelated errors which I don't know how to get rid of.
On the side of the gradient, it seems I'm missing .conj() in various places.

albanD

Looks quite good! Only small comments.

albanD · 2023-01-06T13:18:18Z

test/test_reductions.py

You might want to define this as a ref for the OpInfo and it will check that the ref match for all simple inputs without the need for this custom code. Or there are more cases tested here that cannot be tested via the OpInfo?

I found several challenges in using OpInfo (I probably don't understand OpInfo enough to fully utliize it) for this complex logcumsumexp function:

To compare 2 outputs of logcumsumexp functions, the imaginary part needs to be standardized to be within (0, 2 pi) or (-pi, pi). This is because log(r*e^{i t}) = log|r| + i * (t + 2 pi * n).

scipy.logsumexp gives some confusing answers involving inf case (see my comment around line 590 of this file)

test/test_reductions.py

torch/testing/_internal/common_methods_invocations.py

albanD

Thanks for the update. Sounds good!

mfkasim1 · 2023-01-11T17:03:50Z

Thanks for the approve, @albanD! However, the last time the windows instance run, it still produces an error on the test. I can't reproduce this error on my machine and somehow the error is only raised on windows instance (not other instance). It's related to an edge case involving inf and nan. If possible, I would like to resolve this before getting merged.

ngimel · 2023-01-11T17:12:15Z

math libraries are often different on windows and produce different results in edge cases, I think you can just skip windows test (we have quite a few such skips already)

mfkasim1 · 2023-01-11T22:00:01Z

Thanks, @ngimel! I'll push an update that skips the test for windows.

mfkasim1 · 2023-01-13T11:33:00Z

I think I'm done with the inf-nan bug on windows. If any of you would like to comment on anything on this PR, please let me know.

albanD

Sounds good to me

mfkasim1 · 2023-01-17T13:35:09Z

What's next for this?

albanD · 2023-01-17T14:26:06Z

Ho sorry, once the PR is approved, you can ask the bot to merge it yourself :)

@pytorchbot -h

pytorch-bot · 2023-01-17T14:26:07Z

PyTorchBot Help

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci} ...

In order to invoke the bot on your PR, include a line that starts with
@pytorchbot anywhere in a comment. That line will form the command; no
multi-line commands are allowed. 

Example:
    Some extra context, blah blah, wow this PR looks awesome

    @pytorchbot merge

optional arguments:
  -h, --help            Show this help message and exit.

command:
  {merge,revert,rebase,label,drci}
    merge               Merge a PR
    revert              Revert a PR
    rebase              Rebase a PR
    label               Add label to a PR
    drci                Update Dr. CI

Merge

usage: @pytorchbot merge [-g | -f MESSAGE | -l] [-r [{viable/strict,master}]]

Merge an accepted PR, subject to the rules in .github/merge_rules.json.
By default, this will wait for all required checks (lint, pull) to succeed before merging.

optional arguments:
  -g, --green           Merge when all status checks running on the PR pass. To add status checks, use labels like `ciflow/trunk`.
  -f MESSAGE, --force MESSAGE
                        Merge without checking anything. This requires a reason for auditting purpose, for example:
                        @pytorchbot merge -f 'Minor update to fix lint. Expecting all PR tests to pass'
  -l, --land-checks     [Deprecated - your PR instead now gets the `ciflow/trunk` label on approval] Merge with land time checks. This will create a new branch with your changes rebased on viable/strict and run a majority of trunk tests _before_ landing to increase trunk reliability and decrease risk of revert. The tests added are: pull, Lint and trunk. Note that periodic is excluded.
  -r [{viable/strict,master}], --rebase [{viable/strict,master}]
                        Rebase the PR to re run checks before merging.  Accepts viable/strict or master as branch options and will default to viable/strict if not specified.

Revert

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Revert a merged PR. This requires that you are a Meta employee.

Example:
  @pytorchbot revert -m="This is breaking tests on trunk. hud.pytorch.org/" -c=nosignal

optional arguments:
  -m MESSAGE, --message MESSAGE
                        The reason you are reverting, will be put in the commit message. Must be longer than 3 words.
  -c {nosignal,ignoredsignal,landrace,weird,ghfirst}, --classification {nosignal,ignoredsignal,landrace,weird,ghfirst}
                        A machine-friendly classification of the revert reason.

Rebase

usage: @pytorchbot rebase [-s | -b BRANCH]

Rebase a PR. Rebasing defaults to the stable viable/strict branch of pytorch.
You must have write permissions to the repo to rebase a PR.

optional arguments:
  -s, --stable          [DEPRECATED] Rebase onto viable/strict
  -b BRANCH, --branch BRANCH
                        Branch you would like to rebase to

Label

usage: @pytorchbot label labels [labels ...]

Adds label to a PR

positional arguments:
  labels  Labels to add to given Pull Request

Dr CI

usage: @pytorchbot drci

Update Dr. CI. Updates the Dr. CI comment on the PR in case it's gotten out of sync with actual CI results.

mfkasim1 · 2023-01-18T12:50:54Z

@pytorchbot merge

pytorchmergebot · 2023-01-18T12:53:30Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-18T12:53:31Z

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase by leaving the following comment on this PR:
@pytorchbot rebase

Details for Dev Infra team

Raised by workflow job

mfkasim1 · 2023-01-18T13:56:59Z

@pytorchbot rebase

pytorchmergebot · 2023-01-18T13:58:50Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2023-01-18T13:58:56Z

Successfully rebased clse1 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout clse1 && git pull --rebase)

mfkasim1 · 2023-01-20T11:17:32Z

@pytorchbot merge

pytorchmergebot · 2023-01-20T11:19:32Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-20T11:19:36Z

Merge failed

Reason: 3 jobs have failed, first few of them are: linux-binary-libtorch-cxx11-abi / libtorch-cpu-shared-with-deps-cxx11-abi-build / build, trunk / macos-12-py3-arm64 / test (functorch, 1, 1, macos-m1-12), linux-binary-libtorch-pre-cxx11 / libtorch-cpu-shared-with-deps-pre-cxx11-build / build

Details for Dev Infra team

Raised by workflow job

mfkasim1 · 2023-01-20T14:25:06Z

@pytorchbot merge -f "unrelated error"

pytorch-bot · 2023-01-20T14:25:09Z

You are not authorized to force merges to this repository. Please use the regular @pytorchmergebot merge command instead

mfkasim1 · 2023-01-20T14:27:10Z

@albanD I think I need your help in merging this PR. The error seems to be a continuation of #92626

albanD · 2023-01-20T15:04:30Z

@pytorchbot merge -f "Flaky CI"

pytorchmergebot · 2023-01-20T15:10:46Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

malfet · 2023-01-24T01:39:40Z

This PR increased nightly build time by 30 min, trying a few remedies.

vors · 2023-01-24T20:00:59Z

We also observe this code timing out our build

ngimel · 2023-01-24T20:32:04Z

@mfkasim1 is performance of this operation important? Such increase in build time for a niche op is not great.

malfet · 2023-01-24T20:40:36Z

@pytorchbot revert -m "Reverting to decrease build time, let's discuss the alternatives here" -c weird

pytorchmergebot · 2023-01-24T20:49:04Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-01-24T20:49:13Z

@mfkasim1 your PR has been successfully reverted.

This reverts commit 6498512. Reverted #90847 on behalf of https://github.com/malfet due to Reverting to decrease build time, let's discuss the alternatives here

mfkasim1 · 2023-01-24T21:05:47Z

@mfkasim1 is performance of this operation important? Such increase in build time for a niche op is not great.

Yes, this is the bottleneck operation in my research.
What could I do to make the build time shorter?

malfet · 2023-01-24T22:58:54Z

Yes, this is the bottleneck operation in my research. What could I do to make the build time shorter?

Can you give some quantifiable numbers? I.e. how bad the perf would be if it's a composite op? Also, can we make it jiteratable for just complex numbers? Or keep out of core?

mfkasim1 · 2023-01-24T23:22:09Z

I tried implementing it with real logcumsumexp and basically the complex version needs 4 calls to the real logcumsumexp plus many other ops (see the code below).
I'm not familiar with jiterator etc. Could you give me some pointers about it? I think initially I tried to use jiterator for this, but I encountered a lot of errors during the build, so I gave up.

def _logcumsumexp(z: torch.Tensor, dim: int) -> torch.Tensor:
    # complex type
    q = torch.real(z)
    k = torch.imag(z)
    a = _logcumsum_aexp(torch.cos(k), q, dim=dim)
    b = _logcumsum_aexp(torch.sin(k), q, dim=dim)
    c = _log_add_exp(a, b + 0.5j * np.pi)
    return c

def _logcumsum_aexp(a: torch.Tensor, b: torch.Tensor, dim: int) -> torch.Tensor:
    # log(cumsum(a * exp(b))), a & b are real, but the returned values are complex
    log_a_pos = torch.log(torch.clamp(a, min=torch.finfo(a.dtype).tiny))
    log_a_neg = torch.log(torch.clamp(-a, min=torch.finfo(a.dtype).tiny))
    lcse_pos = torch.logcumsumexp(b + log_a_pos, dim=dim)
    lcse_neg = torch.logcumsumexp(b + log_a_neg, dim=dim)
    return _log_add_exp(lcse_pos, lcse_neg + 1j * np.pi)

def _log_add_exp(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
    # complex x & y
    xr = torch.real(x)
    xi = torch.imag(x) if torch.is_complex(x) else torch.zeros_like(x)
    yr = torch.real(y)
    yi = torch.imag(y) if torch.is_complex(y) else torch.zeros_like(y)
    x_greater = xr > yr
    rmax = torch.where(x_greater, xr, yr)
    imax = torch.where(x_greater, xi, yi)
    rmin = torch.where(x_greater, yr, xr)
    imin = torch.where(x_greater, yi, xi)
    return rmax + torch.log(torch.exp(1j * imax) + torch.exp(rmin - rmax + 1j * imin))

@albanD

Partial work from #90847, in the direction of solving #89205. Most of the content is from #90847, but this is only for CPU, so hopefully it does not increase the build time by a lot. tag: @albanD, @malfet Pull Request resolved: #93153 Approved by: https://github.com/malfet, https://github.com/Skylion007

@ngimel

Hopefully fixes #89205. This is another version of #90847 where it was reverted because it increases the compile-time significantly. From my discussion with @ngimel in #93153 (comment), it seems the option of jiterator would be very tricky if not impossible. So what I did was to optimize the compile-time in my computer. To optimize the build time, first I compile the pytorch as a whole, then only change the `LogcumsumexpKernel.cu` file to see how it changes the compile time. Here are my results for the compilation time of only the `LogcumsumexpKernel.cu` file in my computer: - Original version (without any complex implementations): 56s (about 1 minute) - The previous PR (#90847): 13m 57s (about 14 minutes) - This PR: 3m 35s (about 3.5 minutes) If the previous PR increases the build time by 30 mins in pytorch's computer, then this PR reduces the increment of build time to about 6 mins. Hopefully this is an acceptable level of build-time increase. What I did was (sorted by how significant it reduces the build time from the most significant one): - Substituting `log(x)` to `log1p(x - 1)`. This is applied in the infinite case, so we don't really care about precision. - Implementing complex exponential manually tag: @malfet, @albanD Pull Request resolved: #94310 Approved by: https://github.com/Skylion007, https://github.com/malfet

mfkasim1 requested review from albanD, mruberry, ngimel and soulitzer as code owners December 14, 2022 17:05

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Dec 14, 2022

pytorchbot added the open source label Dec 14, 2022

soulitzer removed their request for review December 14, 2022 18:21

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 15, 2022

albanD reviewed Jan 6, 2023

View reviewed changes

albanD approved these changes Jan 11, 2023

View reviewed changes

albanD approved these changes Jan 13, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 18, 2023

pytorchmergebot force-pushed the clse1 branch from f1b5289 to d28c203 Compare January 19, 2023 16:41

pytorchmergebot added the Merged label Jan 20, 2023

pytorchmergebot closed this in 6498512 Jan 20, 2023

malfet mentioned this pull request Jan 24, 2023

Nightly conda CUDA builds timeout since Sat Jan 21th 2023 #92852

Closed

pytorchmergebot added the Reverted label Jan 24, 2023

mfkasim1 mentioned this pull request Jan 27, 2023

Logcumsumexp for CPU #93153

Closed

Skylion007 mentioned this pull request Jan 29, 2023

[complex] nansum & nanmean #93199

Closed

mfkasim1 mentioned this pull request Feb 7, 2023

Logcumsumexp for CUDA (build-time optimized) #94310

Closed

lezcano changed the title ~~Logcumsumexp for complex in CPU and CUDA~~ Add CPU/CUDA support to torch.logcumsumexp Feb 20, 2023

Add CPU/CUDA support to torch.logcumsumexp #90847

Add CPU/CUDA support to torch.logcumsumexp #90847

Uh oh!

Conversation

mfkasim1 commented Dec 14, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90847

❌ 3 Failures

Uh oh!

mfkasim1 commented Dec 15, 2022

Uh oh!

albanD commented Dec 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD commented Jan 4, 2023

Uh oh!

mfkasim1 commented Jan 5, 2023

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Jan 6, 2023

Choose a reason for hiding this comment

Uh oh!

mfkasim1 Jan 9, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

mfkasim1 commented Jan 11, 2023

Uh oh!

ngimel commented Jan 11, 2023

Uh oh!

mfkasim1 commented Jan 11, 2023

Uh oh!

mfkasim1 commented Jan 13, 2023

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

mfkasim1 commented Jan 17, 2023

Uh oh!

albanD commented Jan 17, 2023

Uh oh!

pytorch-bot bot commented Jan 17, 2023

PyTorchBot Help

Merge

Revert

Rebase

Label

Dr CI

Uh oh!

mfkasim1 commented Jan 18, 2023

Uh oh!

pytorchmergebot commented Jan 18, 2023

Merge started

Uh oh!

pytorchmergebot commented Jan 18, 2023

Merge failed

Uh oh!

mfkasim1 commented Jan 18, 2023

Uh oh!

pytorchmergebot commented Jan 18, 2023

Uh oh!

pytorchmergebot commented Jan 18, 2023

Uh oh!

mfkasim1 commented Jan 20, 2023

Uh oh!

pytorchmergebot commented Jan 20, 2023

Merge started

Uh oh!

pytorchmergebot commented Jan 20, 2023

Merge failed

mfkasim1 commented Dec 14, 2022 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Dec 14, 2022 •

edited

Loading

albanD commented Dec 26, 2022 •

edited

Loading

mfkasim1 commented Jan 24, 2023 •

edited

Loading