[ROCm][inductor] More configs for pointwise kernels. by naromero77amd · Pull Request #166470 · pytorch/pytorch

naromero77amd · 2025-10-28T23:48:22Z

This config improves performance by 250% on some kernels that contain t1.atomic_add(...). Again, we conditionalize for ROCm/HIP, so there is no impact to NV.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos

pytorch-bot · 2025-10-28T23:48:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166470

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ac96ea4 with merge base 1e836bc ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

PaulZhang12

LGTM! Seems like another one of those layout issues with num_warps=1....

eellison

Hmm, we really need a repo, where we can check in kernels that are used for any of benchmark, and make sure we capture, dont regress, and can optimize in future. tritonbench sort of covers this but might be too general.

eellison · 2025-10-29T22:06:47Z

torch/_inductor/runtime/triton_heuristics.py

                            num_stages=2,
                            waves_per_eu=1,  # 20% improvement
                        ),
+                        triton_config_with_settings(


Can we conditionalize this if atomic add is actually present ?

I don't see how we can do this.

@jataylo any ideas?

See, num_stores in heuristics:

pytorch/torch/_inductor/codegen/triton.py

Line 5052 in e0604d3

"num_store": self.num_store,

At the moment, t1.atomic_add are counted as stores. So I could not distinguish kernel with one atomic_add vs one store. So I had to add another field into this inductor_meta structure.

Sounds good - was just pointing to where we do similar analysis. makes sense you need new field.

pruthvistony · 2025-10-30T19:32:18Z

@pytorchbot merge

pytorchmergebot · 2025-10-30T19:34:18Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This config improves performance by 250% on some kernels that contain `t1.atomic_add(...)`. Again, we conditionalize for ROCm/HIP, so there is no impact to NV. Pull Request resolved: #166470 Approved by: https://github.com/PaulZhang12, https://github.com/mlazos, https://github.com/eellison, https://github.com/jansel

This config improves performance by 250% on some kernels that contain `t1.atomic_add(...)`. Again, we conditionalize for ROCm/HIP, so there is no impact to NV. Pull Request resolved: pytorch#166470 Approved by: https://github.com/PaulZhang12, https://github.com/mlazos, https://github.com/eellison, https://github.com/jansel

…ports (#2807) These are backports based on these upstream PRs. Cherrypicks were performed when they where possible. pytorch#163908 (persistent reduction autotune) pytorch#161280 (reduction) pytorch#162053 (foreach) pytorch#163197 (pointwise) pytorch#166470 (pointwise config for atomic add) Also included are some additional customer-specific configs which were not upstreamed but are in this backport to 2.9 #2723 Did not backport filter functions such as ` _maybe_filter_configs_for_tma_restrictions` https://github.com/ROCm/pytorch/blob/release/2.9/torch/_inductor/runtime/triton_heuristics.py#L2614 --------- Co-authored-by: Jack Taylor <jack.taylor@amd.com> Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com> Co-authored-by: Sampsa Riikonen <sriikone@amd.com> Co-authored-by: AmdSampsa <sampsa.riikonen@amd.com>

Add another inductor config for pointwise kernels.

f1855f5

pytorch-bot bot added ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm module: inductor module: rocm AMD GPU support for Pytorch labels Oct 28, 2025

naromero77amd added the release notes: inductor label Oct 28, 2025

naromero77amd marked this pull request as draft October 28, 2025 23:49

pytorchbot added the open source label Oct 29, 2025

naromero77amd requested review from PaulZhang12, eellison, jansel and robieta October 29, 2025 14:44

naromero77amd marked this pull request as ready for review October 29, 2025 14:45

PaulZhang12 approved these changes Oct 29, 2025

View reviewed changes

mlazos approved these changes Oct 29, 2025

View reviewed changes

eellison reviewed Oct 29, 2025

View reviewed changes

naromero77amd added 2 commits October 30, 2025 16:35

Add atomic_add_found to inductor_meta.

528ee6b

Update triton heuristics.

ac96ea4

naromero77amd added the ciflow/inductor-rocm Trigger "inductor" config CI on ROCm label Oct 30, 2025

eellison approved these changes Oct 30, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 30, 2025

pytorchmergebot added the merging label Oct 30, 2025

jansel approved these changes Oct 30, 2025

View reviewed changes

pytorchmergebot added the Merged label Oct 30, 2025

pytorchmergebot closed this in 5fc2c7a Oct 30, 2025

pytorchmergebot removed the merging label Oct 30, 2025

naromero77amd mentioned this pull request Nov 21, 2025

[NO CP][release/2.7][ROCm][inductor] Inductor heuristic upstream backports ROCm/pytorch#2807

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm][inductor] More configs for pointwise kernels.#166470

[ROCm][inductor] More configs for pointwise kernels.#166470
naromero77amd wants to merge 3 commits intopytorch:mainfrom
ROCm:rocm_autotune_pointwise_more_configs

naromero77amd commented Oct 28, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 28, 2025 •

edited

Loading

Uh oh!

PaulZhang12 left a comment

Uh oh!

eellison left a comment •

edited

Loading

Uh oh!

eellison Oct 29, 2025

Uh oh!

naromero77amd Oct 29, 2025

Uh oh!

eellison Oct 29, 2025

Uh oh!

naromero77amd Oct 30, 2025

Uh oh!

eellison Oct 30, 2025

Uh oh!

pruthvistony commented Oct 30, 2025

Uh oh!

pytorchmergebot commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

naromero77amd commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166470

✅ No Failures

Uh oh!

PaulZhang12 left a comment

Choose a reason for hiding this comment

Uh oh!

eellison left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eellison Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

naromero77amd Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

eellison Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

naromero77amd Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

eellison Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

pruthvistony commented Oct 30, 2025

Uh oh!

pytorchmergebot commented Oct 30, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

naromero77amd commented Oct 28, 2025 •

edited

Loading

pytorch-bot bot commented Oct 28, 2025 •

edited

Loading

eellison left a comment •

edited

Loading