Skip to content

Less aggressive persistent reduction when it could induce large masking with dynamic shapes#163365

Closed
eellison wants to merge 6 commits intogh/eellison/826/basefrom
gh/eellison/826/head
Closed

Less aggressive persistent reduction when it could induce large masking with dynamic shapes#163365
eellison wants to merge 6 commits intogh/eellison/826/basefrom
gh/eellison/826/head

Conversation

@eellison
Copy link
Contributor

@eellison eellison commented Sep 19, 2025

Stack from ghstack (oldest at bottom):

As per comment in source code:

            # If we are are coalescing on xblock (not ReductionHint.INNER) and this is not a tiny kernel
            # (not ReductionHint.OUTER_TINY), do not use persistent reduction if it induces tile
            # quantization. Peristent reduction forces rblock == rnumel, if the bounds between lower
            # and upper are large, for the lower values we will be masking off large % of read/writes,
            # when we could expand the coalescing xblock instead.

For the test case in question, this pr improves perf from 0.8573521325143717 -> 0.043151492193814305 because we were egregiously masking out rblock values (58/64 values).

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

Differential Revision: D82853279

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 19, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163365

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2669b24 with merge base 636a511 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eellison added a commit that referenced this pull request Sep 19, 2025
…ng with dynamic shapes

ghstack-source-id: e6fcb92
Pull Request resolved: #163365
[ghstack-poisoned]
eellison added a commit that referenced this pull request Sep 19, 2025
…ng with dynamic shapes

ghstack-source-id: fac50cf
Pull Request resolved: #163365
[ghstack-poisoned]
eellison added a commit that referenced this pull request Sep 19, 2025
…ng with dynamic shapes

ghstack-source-id: 6907ee2
Pull Request resolved: #163365
@eellison eellison added the topic: not user facing topic category label Sep 19, 2025
@eellison
Copy link
Contributor Author

@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 19, 2025
[ghstack-poisoned]
eellison added a commit that referenced this pull request Sep 19, 2025
…ng with dynamic shapes

ghstack-source-id: 460d763
Pull Request resolved: #163365
@eellison eellison requested a review from jansel September 22, 2025 16:40
[ghstack-poisoned]
eellison added a commit that referenced this pull request Sep 22, 2025
…ng with dynamic shapes

ghstack-source-id: 0d32866
Pull Request resolved: #163365
Copy link
Contributor

@PaulZhang12 PaulZhang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

[ghstack-poisoned]
eellison added a commit that referenced this pull request Sep 23, 2025
…ng with dynamic shapes

ghstack-source-id: 79a60ba
Pull Request resolved: #163365
@eellison
Copy link
Contributor Author

@eellison has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@eellison
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team Raised by workflow job

@eellison
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…ng with dynamic shapes (pytorch#163365)

As per comment in source code:
```
            # If we are are coalescing on xblock (not ReductionHint.INNER) and this is not a tiny kernel
            # (not ReductionHint.OUTER_TINY), do not use persistent reduction if it induces tile
            # quantization. Peristent reduction forces rblock == rnumel, if the bounds between lower
            # and upper are large, for the lower values we will be masking off large % of read/writes,
            # when we could expand the coalescing xblock instead.
```

For the test case in question, this pr improves perf from 0.8573521325143717 -> 0.043151492193814305 because we were egregiously masking out rblock values (58/64 values).

Differential Revision: [D82853279](https://our.internmc.facebook.com/intern/diff/D82853279)
Pull Request resolved: pytorch#163365
Approved by: https://github.com/shunting314, https://github.com/PaulZhang12, https://github.com/jansel, https://github.com/v0i0
jainapurva pushed a commit that referenced this pull request Sep 29, 2025
…ng with dynamic shapes (#163365)

As per comment in source code:
```
            # If we are are coalescing on xblock (not ReductionHint.INNER) and this is not a tiny kernel
            # (not ReductionHint.OUTER_TINY), do not use persistent reduction if it induces tile
            # quantization. Peristent reduction forces rblock == rnumel, if the bounds between lower
            # and upper are large, for the lower values we will be masking off large % of read/writes,
            # when we could expand the coalescing xblock instead.
```

For the test case in question, this pr improves perf from 0.8573521325143717 -> 0.043151492193814305 because we were egregiously masking out rblock values (58/64 values).

Differential Revision: [D82853279](https://our.internmc.facebook.com/intern/diff/D82853279)
Pull Request resolved: #163365
Approved by: https://github.com/shunting314, https://github.com/PaulZhang12, https://github.com/jansel, https://github.com/v0i0
@github-actions github-actions bot deleted the gh/eellison/826/head branch October 24, 2025 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants