[cuda] fix triu/tril int32 overflow for large matrices#164705
[cuda] fix triu/tril int32 overflow for large matrices#164705Aminsed wants to merge 2 commits intopytorch:mainfrom
Conversation
Fixes pytorch#136611 Cast blockIdx.x to int64_t before multiplication to prevent overflow when computing linear_idx for matrices larger than 2^31 elements.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164705
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit d968f75 with merge base 321e602 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "release notes: cuda" |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
Lint errors need to be fixed |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
| @onlyCUDA | ||
| @largeTensorTest("40GB") | ||
| def test_triu_tril_large_matrix_64bit(self, device): | ||
| """ | ||
| Test triu/tril with large matrices requiring 64-bit indexing. | ||
| Regression test for https://github.com/pytorch/pytorch/issues/136611 | ||
| """ | ||
| # 100k x 100k matrix with 10B elements requires 64-bit indexing | ||
| q_len = 100000 |
There was a problem hiding this comment.
Looks like this test consistently fails on ROCM: https://github.com/pytorch/pytorch/actions/runs/18647707328/job/53167712663?pr=163955
There was a problem hiding this comment.
Judging from the def, onlyCUDA does not seem to prevent ROCM, but there is skipCUDAIfRocm.
There was a problem hiding this comment.
@skipCUDAIfROCm is what you're looking for. But I filed a DISABLED issue for this new test. We'll take a look as soon as possible as we continue to burn down skipped tests.
Fixes pytorch#136611 Cast blockIdx.x to int64_t before multiplication to prevent overflow when computing linear_idx for matrices larger than 2^31 elements. Pull Request resolved: pytorch#164705 Approved by: https://github.com/eqy, https://github.com/ngimel
Fixes pytorch#136611 Cast blockIdx.x to int64_t before multiplication to prevent overflow when computing linear_idx for matrices larger than 2^31 elements. Pull Request resolved: pytorch#164705 Approved by: https://github.com/eqy, https://github.com/ngimel
Fixes pytorch#136611 Cast blockIdx.x to int64_t before multiplication to prevent overflow when computing linear_idx for matrices larger than 2^31 elements. Pull Request resolved: pytorch#164705 Approved by: https://github.com/eqy, https://github.com/ngimel
Fixes #136611
Cast blockIdx.x to int64_t before multiplication to prevent overflow when computing linear_idx for matrices larger than 2^31 elements.
cc @ngimel @ptrblck @eqy