Logcumsumexp for CUDA (build-time optimized) #94310

mfkasim1 · 2023-02-07T16:13:48Z

Hopefully fixes #89205.
This is another version of #90847 where it was reverted because it increases the compile-time significantly.
From my discussion with @ngimel in #93153 (comment), it seems the option of jiterator would be very tricky if not impossible.
So what I did was to optimize the compile-time in my computer.

To optimize the build time, first I compile the pytorch as a whole, then only change the LogcumsumexpKernel.cu file to see how it changes the compile time.
Here are my results for the compilation time of only the LogcumsumexpKernel.cu file in my computer:

Original version (without any complex implementations): 56s (about 1 minute)
The previous PR (Add CPU/CUDA support to torch.logcumsumexp #90847): 13m 57s (about 14 minutes)
This PR: 3m 35s (about 3.5 minutes)

If the previous PR increases the build time by 30 mins in pytorch's computer, then this PR reduces the increment of build time to about 6 mins. Hopefully this is an acceptable level of build-time increase.

What I did was (sorted by how significant it reduces the build time from the most significant one):

Substituting log(x) to log1p(x - 1). This is applied in the infinite case, so we don't really care about precision.
Implementing complex exponential manually

tag: @malfet, @albanD

pytorch-bot · 2023-02-07T16:14:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94310

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 12 Failures

As of commit c05db6c:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aten/src/ATen/native/cuda/LogcumsumexpKernel.cu

c10/util/complex_math.h

aten/src/ATen/native/cuda/LogcumsumexpKernel.cu

Skylion007 · 2023-02-07T17:12:12Z

Ah, actually scratch the constexpr comments, seems like there are some implementation issues in custom scalar types.

Skylion007 · 2023-02-07T17:16:21Z

aten/src/ATen/native/cuda/LogcumsumexpKernel.cu


+// custom min and max to be used in logcumsumexp for complex arguments
+template <typename scalar_t, bool min>
+__host__ __device__ c10::complex<scalar_t> _logcumsumexp_minmax(const c10::complex<scalar_t>& x, const c10::complex<scalar_t>& y) {


Actually you can revert the templating arg too, its' a bit difficult to setup this in a constexpr if statement that is clean with the all the non-constexpr conditions as well.

Also all the else statements are unnecessary since they all have return statements in them.

What difference does it make if we remove the else statements?

@mfkasim1 just removes extra indentation. That's why it's a nit. Don't really care either way.

Skylion007 · 2023-02-07T17:19:51Z

aten/src/ATen/native/cuda/LogcumsumexpKernel.cu

+    // handling the "infectious" NaNs
+    return {std::numeric_limits<scalar_t>::quiet_NaN(), std::numeric_limits<scalar_t>::quiet_NaN()};
+  }
+  else if ((!::isfinite(min_real)) && (min_real == max_real)) {


nit but a lot of the elses also aren't needed here due since it's all just dealing with early returns

Skylion007

Waiting to hear from @mfkasim1 and @ngimel since the OG change was reverted. Looks reasonable to me though.

mfkasim1 · 2023-02-07T23:05:58Z

Waiting to hear from @mfkasim1 and @ngimel since the OG change was reverted. Looks reasonable to me though.

What do you need from me?

Skylion007 · 2023-02-07T23:18:33Z

Waiting to hear from @mfkasim1 and @ngimel since the OG change was reverted. Looks reasonable to me though.

What do you need from me?

Wasn't either you or @ngimel who reverted the change? Just want to make sure the build times are acceptable.

mfkasim1 · 2023-02-08T10:24:41Z

Waiting to hear from @mfkasim1 and @ngimel since the OG change was reverted. Looks reasonable to me though.

What do you need from me?

Wasn't either you or @ngimel who reverted the change? Just want to make sure the build times are acceptable.

tag: @malfet

mfkasim1 · 2023-02-10T14:44:55Z

@malfet @ngimel Is the build time acceptable?

malfet

Let's wait for binary build results, but otherwise looks good to me

Skylion007 · 2023-02-12T19:59:27Z

@mfkasim1 Looks good to me. Feel free to trigger the merge whenever.

mfkasim1 · 2023-02-13T10:00:00Z

Thanks @malfet @Skylion007

@pytorchbot merge

pytorchmergebot · 2023-02-13T10:02:03Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-02-13T10:02:04Z

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase by leaving the following comment on this PR:
@pytorchbot rebase

Details for Dev Infra team

Raised by workflow job

mfkasim1 · 2023-02-13T10:03:50Z

@pytorchbot rebase

pytorchmergebot · 2023-02-13T10:05:45Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2023-02-13T10:05:51Z

Successfully rebased clse4 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout clse4 && git pull --rebase)

mfkasim1 · 2023-02-13T10:09:50Z

@pytorchbot merge

pytorchmergebot · 2023-02-13T10:11:34Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-02-13T11:27:41Z

Merge failed

Reason: 1 jobs have failed, first few of them are: windows-binary-wheel / wheel-py3_9-cpu-test

Details for Dev Infra team

Raised by workflow job

Skylion007 · 2023-02-13T15:59:04Z

@pytorchbot merge -f 'Unrelated infra issue. Broken smoketest label binaries'

pytorchmergebot · 2023-02-13T16:00:45Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

mfkasim1 requested review from mruberry and ngimel as code owners February 7, 2023 16:13

pytorchbot added the open source label Feb 7, 2023