Skip to content

Fix: nDims is mutated inside the loop in Shape.cu#165446

Closed
alexsibir wants to merge 1 commit intopytorch:mainfrom
alexsibir:export-D84612194
Closed

Fix: nDims is mutated inside the loop in Shape.cu#165446
alexsibir wants to merge 1 commit intopytorch:mainfrom
alexsibir:export-D84612194

Conversation

@alexsibir
Copy link
Contributor

Summary:
The nDims variable is mutated inside the loop but never restored to its original value.
This affects subsequent iterations of the outer loop.
Each batch iteration may get incorrect nDims after the first batch.

Test Plan: CI

Reviewed By: ngimel

Differential Revision: D84612194

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 14, 2025

This appears to be a diff that was exported from phabricator, but the PR author does not have sufficient permissions to run CI. @alexsibir, please do step 2 of internal wiki to get write access so you do not need to get CI approvals in the future. If you think this is a mistake, please contact the Pytorch Dev Infra team.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 14, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: alexsibir / name: Alex Sibiryakov (32afe38)

@pytorch-bot pytorch-bot bot added the release notes: cuda release notes category label Oct 14, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165446

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 32afe38 with merge base e6f766c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-codesync
Copy link

meta-codesync bot commented Oct 14, 2025

@alexsibir has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84612194.

@eqy
Copy link
Collaborator

eqy commented Oct 14, 2025

Would it be possible to add a test for this?

alexsibir added a commit to alexsibir/pytorch that referenced this pull request Oct 14, 2025
Summary:
Pull Request resolved: pytorch#165446

The `nDims` variable is mutated inside the loop but never restored to its original value.
This affects subsequent iterations of the outer loop.
Each batch iteration may get incorrect `nDims` after the first batch.

Test Plan: CI

Reviewed By: ngimel

Differential Revision: D84612194
Summary:

The `nDims` variable is mutated inside the loop but never restored to its original value. 
This affects subsequent iterations of the outer loop. 
Each batch iteration may get incorrect `nDims` after the first batch.

Test Plan: CI

Reviewed By: ngimel

Differential Revision: D84612194
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 14, 2025
@ngimel
Copy link
Collaborator

ngimel commented Oct 15, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@ngimel
Copy link
Collaborator

ngimel commented Oct 15, 2025

Note: by looks of it the PR shouldn't have changed the behavior, nDims is used before that batch loop starts, that's fine. Then inside batch loop it is potentially changed befind if isInOutAligned conditional, and also used behind stride_size > 1 conditional and behind if (memory_format != c10::MemoryFormat::Contiguous) conditional where changed value could mean trouble. But isInOutAligned can be true only if stride_size=1 and memory_format is c10::MemoryFormat::Contiguous, here's how it's set in the beginning

bool isInOutAligned = isContig && at::native::memory::get_alignment(data) >= alignment && memory_format == c10::MemoryFormat::Contiguous
(isContig is set to stride_size == 1 above), so both those uses should be unaffected. But the author claims it fixes their usecase. Test would really be useful.

ngimel added a commit that referenced this pull request Oct 19, 2025
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
Summary:
The `nDims` variable is mutated inside the loop but never restored to its original value.
This affects subsequent iterations of the outer loop.
Each batch iteration may get incorrect `nDims` after the first batch.

Test Plan: CI

Reviewed By: ngimel

Differential Revision: D84612194

Pull Request resolved: pytorch#165446
Approved by: https://github.com/ngimel
pytorchmergebot pushed a commit that referenced this pull request Oct 23, 2025
Per title

Pull Request resolved: #165853
Approved by: https://github.com/drisspg
@ngimel ngimel added this to the 2.9.1 milestone Nov 1, 2025
@ngimel
Copy link
Collaborator

ngimel commented Nov 1, 2025

@pytorchbot cherry-pick --onto release/2.9 --fixes "fixes silent correctness bug" -c regression

@pytorchbot
Copy link
Collaborator

Cherry picking #165446

Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x 4f400ab520f0151c8f01d7c305637276e4a222ca returned non-zero exit code 1

Auto-merging aten/src/ATen/native/cuda/Shape.cu
CONFLICT (content): Merge conflict in aten/src/ATen/native/cuda/Shape.cu
error: could not apply 4f400ab520f... Fix: nDims is mutated inside the loop in Shape.cu (#165446)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

@atalman atalman removed this from the 2.9.1 milestone Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged meta-exported release notes: cuda release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants