Less aggressive persistent reduction when it could induce large masking with dynamic shapes by eellison · Pull Request #163365 · pytorch/pytorch

eellison · 2025-09-19T20:30:05Z

Stack from ghstack (oldest at bottom):

-> Less aggressive persistent reduction when it could induce large masking with dynamic shapes #163365

As per comment in source code:

            # If we are are coalescing on xblock (not ReductionHint.INNER) and this is not a tiny kernel
            # (not ReductionHint.OUTER_TINY), do not use persistent reduction if it induces tile
            # quantization. Peristent reduction forces rblock == rnumel, if the bounds between lower
            # and upper are large, for the lower values we will be masking off large % of read/writes,
            # when we could expand the coalescing xblock instead.

For the test case in question, this pr improves perf from 0.8573521325143717 -> 0.043151492193814305 because we were egregiously masking out rblock values (58/64 values).

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

Differential Revision: D82853279

[ghstack-poisoned]

pytorch-bot · 2025-09-19T20:30:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163365

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2669b24 with merge base 636a511 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ng with dynamic shapes ghstack-source-id: e6fcb92 Pull Request resolved: #163365

[ghstack-poisoned]

…ng with dynamic shapes ghstack-source-id: fac50cf Pull Request resolved: #163365

[ghstack-poisoned]

…ng with dynamic shapes ghstack-source-id: 6907ee2 Pull Request resolved: #163365

eellison · 2025-09-19T20:45:23Z

@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

[ghstack-poisoned]

…ng with dynamic shapes ghstack-source-id: 460d763 Pull Request resolved: #163365

test/inductor/test_torchinductor_dynamic_shapes.py

[ghstack-poisoned]

…ng with dynamic shapes ghstack-source-id: 0d32866 Pull Request resolved: #163365

PaulZhang12

Nice!

[ghstack-poisoned]

…ng with dynamic shapes ghstack-source-id: 79a60ba Pull Request resolved: #163365

eellison · 2025-09-23T18:47:21Z

@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

eellison · 2025-09-23T18:48:57Z

@pytorchbot merge

pytorchmergebot · 2025-09-23T18:50:48Z

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team

Raised by workflow job

eellison · 2025-09-23T19:07:23Z

@pytorchbot merge

pytorchmergebot · 2025-09-23T19:09:11Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ng with dynamic shapes (pytorch#163365) As per comment in source code: ``` # If we are are coalescing on xblock (not ReductionHint.INNER) and this is not a tiny kernel # (not ReductionHint.OUTER_TINY), do not use persistent reduction if it induces tile # quantization. Peristent reduction forces rblock == rnumel, if the bounds between lower # and upper are large, for the lower values we will be masking off large % of read/writes, # when we could expand the coalescing xblock instead. ``` For the test case in question, this pr improves perf from 0.8573521325143717 -> 0.043151492193814305 because we were egregiously masking out rblock values (58/64 values). Differential Revision: [D82853279](https://our.internmc.facebook.com/intern/diff/D82853279) Pull Request resolved: pytorch#163365 Approved by: https://github.com/shunting314, https://github.com/PaulZhang12, https://github.com/jansel, https://github.com/v0i0

…ng with dynamic shapes (#163365) As per comment in source code: ``` # If we are are coalescing on xblock (not ReductionHint.INNER) and this is not a tiny kernel # (not ReductionHint.OUTER_TINY), do not use persistent reduction if it induces tile # quantization. Peristent reduction forces rblock == rnumel, if the bounds between lower # and upper are large, for the lower values we will be masking off large % of read/writes, # when we could expand the coalescing xblock instead. ``` For the test case in question, this pr improves perf from 0.8573521325143717 -> 0.043151492193814305 because we were egregiously masking out rblock values (58/64 values). Differential Revision: [D82853279](https://our.internmc.facebook.com/intern/diff/D82853279) Pull Request resolved: #163365 Approved by: https://github.com/shunting314, https://github.com/PaulZhang12, https://github.com/jansel, https://github.com/v0i0

Update

ef13616

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 19, 2025

eellison added a commit that referenced this pull request Sep 19, 2025

Less aggressive persistent reduction when it could induce large maski…

97f7221

…ng with dynamic shapes ghstack-source-id: e6fcb92 Pull Request resolved: #163365

eellison mentioned this pull request Sep 19, 2025

use reduction hint for aggressive rblock #163366

Closed

Update

9c1c4a4

[ghstack-poisoned]

eellison added a commit that referenced this pull request Sep 19, 2025

Less aggressive persistent reduction when it could induce large maski…

921e972

…ng with dynamic shapes ghstack-source-id: fac50cf Pull Request resolved: #163365

Update

8472e05

[ghstack-poisoned]

eellison added a commit that referenced this pull request Sep 19, 2025

Less aggressive persistent reduction when it could induce large maski…

945b9ff

…ng with dynamic shapes ghstack-source-id: 6907ee2 Pull Request resolved: #163365

eellison requested review from PaulZhang12, shunting314 and v0i0 September 19, 2025 20:44

eellison added the topic: not user facing topic category label Sep 19, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 19, 2025

Update

fe2e5f6

[ghstack-poisoned]

eellison added a commit that referenced this pull request Sep 19, 2025

Less aggressive persistent reduction when it could induce large maski…

064bc6a

…ng with dynamic shapes ghstack-source-id: 460d763 Pull Request resolved: #163365

eellison requested a review from jansel September 22, 2025 16:40

shunting314 approved these changes Sep 22, 2025

View reviewed changes

test/inductor/test_torchinductor_dynamic_shapes.py Outdated Show resolved Hide resolved

test/inductor/test_torchinductor_dynamic_shapes.py Outdated Show resolved Hide resolved

Update

bdc163d

[ghstack-poisoned]

eellison added a commit that referenced this pull request Sep 22, 2025

Less aggressive persistent reduction when it could induce large maski…

9ab4871

…ng with dynamic shapes ghstack-source-id: 0d32866 Pull Request resolved: #163365

PaulZhang12 approved these changes Sep 22, 2025

View reviewed changes

jansel approved these changes Sep 22, 2025

View reviewed changes

v0i0 approved these changes Sep 23, 2025

View reviewed changes

Update

2669b24

[ghstack-poisoned]

eellison added a commit that referenced this pull request Sep 23, 2025

Less aggressive persistent reduction when it could induce large maski…

cee5c31

…ng with dynamic shapes ghstack-source-id: 79a60ba Pull Request resolved: #163365

pytorchmergebot added the merging label Sep 23, 2025

pytorchmergebot removed the merging label Sep 23, 2025

pytorchmergebot added the merging label Sep 23, 2025

pytorchmergebot added the Merged label Sep 23, 2025

pytorchmergebot closed this in 29af258 Sep 23, 2025

pytorchmergebot removed the merging label Sep 23, 2025

github-actions bot deleted the gh/eellison/826/head branch October 24, 2025 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less aggressive persistent reduction when it could induce large masking with dynamic shapes#163365

Less aggressive persistent reduction when it could induce large masking with dynamic shapes#163365
eellison wants to merge 6 commits intogh/eellison/826/basefrom
gh/eellison/826/head

eellison commented Sep 19, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

eellison commented Sep 19, 2025

Uh oh!

Uh oh!

Uh oh!

PaulZhang12 left a comment

Uh oh!

eellison commented Sep 23, 2025

Uh oh!

eellison commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Uh oh!

eellison commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

eellison commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163365

✅ No Failures

Uh oh!

eellison commented Sep 19, 2025

Uh oh!

Uh oh!

Uh oh!

PaulZhang12 left a comment

Choose a reason for hiding this comment

Uh oh!

eellison commented Sep 23, 2025

Uh oh!

eellison commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Merge failed

Uh oh!

eellison commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

eellison commented Sep 19, 2025 •

edited

Loading

pytorch-bot bot commented Sep 19, 2025 •

edited

Loading