[cuda] fix triu/tril int32 overflow for large matrices by Aminsed · Pull Request #164705 · pytorch/pytorch

Aminsed · 2025-10-05T23:03:24Z

Cast blockIdx.x to int64_t before multiplication to prevent overflow when computing linear_idx for matrices larger than 2^31 elements.

cc @ngimel @ptrblck @eqy

Fixes pytorch#136611 Cast blockIdx.x to int64_t before multiplication to prevent overflow when computing linear_idx for matrices larger than 2^31 elements.

pytorch-bot · 2025-10-05T23:03:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164705

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm][CI] Machines under the label linux.rocm.gpu.2 are undergoing maintenance.

✅ No Failures

As of commit d968f75 with merge base 321e602 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Aminsed · 2025-10-05T23:03:41Z

@pytorchbot label "release notes: cuda"

pytorch-bot · 2025-10-05T23:41:01Z

To add the ciflow label ciflow/trunk please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

pytorch-bot · 2025-10-05T23:41:05Z

To add the ciflow label ciflow/h100 please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

pytorch-bot · 2025-10-05T23:41:10Z

To add the ciflow label ciflow/trunk please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

pytorch-bot · 2025-10-05T23:41:10Z

To add the ciflow label ciflow/h100 please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

ngimel · 2025-10-06T17:22:31Z

Lint errors need to be fixed

Aminsed · 2025-10-19T21:28:57Z

@egy @ngimel Could you please merge as I'm not authorized.

ngimel · 2025-10-20T01:22:39Z

@pytorchbot merge

pytorchmergebot · 2025-10-20T01:24:36Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

nikitaved · 2025-10-20T13:07:16Z

test/test_linalg.py

+    @onlyCUDA
+    @largeTensorTest("40GB")
+    def test_triu_tril_large_matrix_64bit(self, device):
+        """
+        Test triu/tril with large matrices requiring 64-bit indexing.
+        Regression test for https://github.com/pytorch/pytorch/issues/136611
+        """
+        # 100k x 100k matrix with 10B elements requires 64-bit indexing
+        q_len = 100000


Looks like this test consistently fails on ROCM: https://github.com/pytorch/pytorch/actions/runs/18647707328/job/53167712663?pr=163955

@onlyCUDA maybe?

Judging from the def, onlyCUDA does not seem to prevent ROCM, but there is skipCUDAIfRocm.

@skipCUDAIfROCm is what you're looking for. But I filed a DISABLED issue for this new test. We'll take a look as soon as possible as we continue to burn down skipped tests.

Fixes pytorch#136611 Cast blockIdx.x to int64_t before multiplication to prevent overflow when computing linear_idx for matrices larger than 2^31 elements. Pull Request resolved: pytorch#164705 Approved by: https://github.com/eqy, https://github.com/ngimel

fix triu/tril int32 overflow for large matrices on CUDA

56defb5

Fixes pytorch#136611 Cast blockIdx.x to int64_t before multiplication to prevent overflow when computing linear_idx for matrices larger than 2^31 elements.

Aminsed requested review from Aidyn-A, IvanYashchuk, eqy, lezcano, nikitaved and syed-ahmed as code owners October 5, 2025 23:03

pytorch-bot bot added the release notes: linalg_frontend release notes category label Oct 5, 2025

pytorch-bot bot added the release notes: cuda release notes category label Oct 5, 2025

pytorchbot added the open source label Oct 5, 2025

eqy approved these changes Oct 5, 2025

View reviewed changes

eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 5, 2025

eqy added the ciflow/h100 label Oct 5, 2025

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Oct 5, 2025

pytorch-bot bot removed the ciflow/h100 label Oct 5, 2025

eqy added ciflow/trunk Trigger trunk jobs on your pull request ciflow/h100 labels Oct 5, 2025

pytorch-bot bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/h100 labels Oct 5, 2025

Aidyn-A added ciflow/trunk Trigger trunk jobs on your pull request ciflow/h100 labels Oct 6, 2025

ngimel approved these changes Oct 6, 2025

View reviewed changes

eqy mentioned this pull request Oct 9, 2025

Fix int32 overflow in embedding_dense_backward #165095

Closed

fix trailing whitespace in test_linalg.py

d968f75

pytorch-bot bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/h100 labels Oct 11, 2025

Aminsed mentioned this pull request Oct 12, 2025

Inconsistent handling of -0.0 between CPU and CUDA for many operators #162235

Open

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 20, 2025

pytorchmergebot added the merging label Oct 20, 2025

pytorchmergebot added the Merged label Oct 20, 2025

pytorchmergebot closed this in c1eda34 Oct 20, 2025

pytorchmergebot removed the merging label Oct 20, 2025

nikitaved reviewed Oct 20, 2025

View reviewed changes

jeffdaily mentioned this pull request Oct 21, 2025

DISABLED test_triu_tril_large_matrix_64bit_cuda (__main__.TestLinalgCUDA) #165966

Open

Conversation

Aminsed commented Oct 5, 2025

Uh oh!

pytorch-bot bot commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164705

❗ 1 Active SEVs

✅ No Failures

Uh oh!

Aminsed commented Oct 5, 2025

Uh oh!

pytorch-bot bot commented Oct 5, 2025

Uh oh!

pytorch-bot bot commented Oct 5, 2025

Uh oh!

pytorch-bot bot commented Oct 5, 2025

Uh oh!

pytorch-bot bot commented Oct 5, 2025

Uh oh!

ngimel commented Oct 6, 2025

Uh oh!

Aminsed commented Oct 19, 2025

Uh oh!

ngimel commented Oct 20, 2025

Uh oh!

pytorchmergebot commented Oct 20, 2025

Merge started

Uh oh!

nikitaved Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Aminsed Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

nikitaved Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeffdaily Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

pytorch-bot bot commented Oct 5, 2025 •

edited

Loading

nikitaved Oct 20, 2025 •

edited

Loading