Skip to content

[Inductor] Fix combo kernels for cpu backend#167781

Closed
karthickai wants to merge 24 commits intogh/karthickai/12/basefrom
gh/karthickai/12/head
Closed

[Inductor] Fix combo kernels for cpu backend#167781
karthickai wants to merge 24 commits intogh/karthickai/12/basefrom
gh/karthickai/12/head

Conversation

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167781

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 4 Unrelated Failures

As of commit f46c098 with merge base e770c95 (image):

NEW FAILURE - The following job has failed:

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

karthickai added a commit that referenced this pull request Nov 14, 2025
ghstack-source-id: e8bd682
Pull Request resolved: #167781
@karthickai karthickai marked this pull request as draft November 14, 2025 00:42
@karthickai karthickai added release notes: inductor ciflow/trunk Trigger trunk jobs on your pull request labels Nov 14, 2025
@karthickai karthickai changed the title [WIP][Inductor] Fix combo kernels for cpu backend [Inductor] Fix combo kernels for cpu backend Nov 17, 2025
@karthickai karthickai requested a review from mlazos November 17, 2025 20:11
Fixes: #167780 


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
karthickai added a commit that referenced this pull request Nov 18, 2025
ghstack-source-id: 27e6b5d
Pull Request resolved: #167781
Fixes: #167780 


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
@karthickai karthickai marked this pull request as ready for review November 18, 2025 20:53
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
subkernel_nodes = nodes
device = subkernel_nodes[0].get_device()

if not all(node.get_device() == device for node in subkernel_nodes):
Copy link
Contributor

@mlazos mlazos Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm why does this case happen? we should only try and make combokernels in the combokernel pass with groups of nodes on the same device

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_nodes_for_combo_kernels doesn't filter by device. While I was debugging for this issue (#168067) and found mixed devices

> /home/karthickps/issues/torch/_inductor/scheduler.py(6127)speedup_by_combo_kernel()
-> breakpoint()
(Pdb) [node.get_device() for node in subkernel_nodes]
[device(type='cuda', index=0), device(type='cpu')]

the code was only checking device = subkernel_nodes[0].get_device() (first node), so mixed-device groups slip through. This check blocks them and solves the mixed device issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or else we can add this check in _default_group_nodes_for_combo_kernels

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed with Michael, I'm moving this filter to _default_group_nodes_for_combo_kernels and adding assert in speedup_by_combo_kernel

This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #168109

This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
karthickai added a commit that referenced this pull request Dec 2, 2025
ghstack-source-id: affe011
Pull Request resolved: #167781
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos

[ghstack-poisoned]
@karthickai
Copy link
Collaborator Author

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

pytorchmergebot pushed a commit that referenced this pull request Dec 5, 2025
…rgs (#168127)

Fixes: #168124
This PR fixes triton compilation failures in combo kernels when combining multiple kernels with random ops (or any ops that creates args with value equal to 1). The fix adds the missing logic to populate the `constants` for args marked as compile-time constants, matching the behavior of regular Triton kernels.

Pull Request resolved: #168127
Approved by: https://github.com/mlazos
ghstack dependencies: #167781
umechand-amd pushed a commit to ROCm/pytorch that referenced this pull request Dec 8, 2025
This PR fixes two issues
Fixes: pytorch#167780 combo_kernel fails with CppScheduling backend
Fixes: pytorch#168067 combo_kernel fails with mixed cpu/cuda nodes

Pull Request resolved: pytorch#167781
Approved by: https://github.com/mlazos
umechand-amd pushed a commit to ROCm/pytorch that referenced this pull request Dec 8, 2025
…rgs (pytorch#168127)

Fixes: pytorch#168124
This PR fixes triton compilation failures in combo kernels when combining multiple kernels with random ops (or any ops that creates args with value equal to 1). The fix adds the missing logic to populate the `constants` for args marked as compile-time constants, matching the behavior of regular Triton kernels.

Pull Request resolved: pytorch#168127
Approved by: https://github.com/mlazos
ghstack dependencies: pytorch#167781
JacobSzwejbka pushed a commit that referenced this pull request Dec 8, 2025
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes

Pull Request resolved: #167781
Approved by: https://github.com/mlazos
JacobSzwejbka pushed a commit that referenced this pull request Dec 8, 2025
…rgs (#168127)

Fixes: #168124
This PR fixes triton compilation failures in combo kernels when combining multiple kernels with random ops (or any ops that creates args with value equal to 1). The fix adds the missing logic to populate the `constants` for args marked as compile-time constants, matching the behavior of regular Triton kernels.

Pull Request resolved: #168127
Approved by: https://github.com/mlazos
ghstack dependencies: #167781
tiendatngcs pushed a commit to tiendatngcs/pytorch-Dec25 that referenced this pull request Dec 10, 2025
tiendatngcs pushed a commit to tiendatngcs/pytorch-Dec25 that referenced this pull request Dec 10, 2025
@github-actions github-actions bot deleted the gh/karthickai/12/head branch January 4, 2026 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor release notes: inductor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants