[Inductor] Fix combo kernels for cpu backend#167781
[Inductor] Fix combo kernels for cpu backend#167781karthickai wants to merge 24 commits intogh/karthickai/12/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167781
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 4 Unrelated FailuresAs of commit f46c098 with merge base e770c95 ( NEW FAILURE - The following job has failed:
UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Fixes: #167780 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
Fixes: #167780 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
torch/_inductor/scheduler.py
Outdated
| subkernel_nodes = nodes | ||
| device = subkernel_nodes[0].get_device() | ||
|
|
||
| if not all(node.get_device() == device for node in subkernel_nodes): |
There was a problem hiding this comment.
Hmm why does this case happen? we should only try and make combokernels in the combokernel pass with groups of nodes on the same device
There was a problem hiding this comment.
group_nodes_for_combo_kernels doesn't filter by device. While I was debugging for this issue (#168067) and found mixed devices
> /home/karthickps/issues/torch/_inductor/scheduler.py(6127)speedup_by_combo_kernel()
-> breakpoint()
(Pdb) [node.get_device() for node in subkernel_nodes]
[device(type='cuda', index=0), device(type='cpu')]the code was only checking device = subkernel_nodes[0].get_device() (first node), so mixed-device groups slip through. This check blocks them and solves the mixed device issue.
There was a problem hiding this comment.
or else we can add this check in _default_group_nodes_for_combo_kernels
There was a problem hiding this comment.
I discussed with Michael, I'm moving this filter to _default_group_nodes_for_combo_kernels and adding assert in speedup_by_combo_kernel
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
|
Starting merge as part of PR stack under #168109 |
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 5 checks: pull / linux-jammy-py3.14-clang12 / test (default, 3, 5, linux.4xlarge), pull / linux-jammy-py3.14-clang12 / test (default, 2, 5, linux.4xlarge), pull / linux-jammy-py3.14-clang12 / test (default, 5, 5, linux.4xlarge), pull / linux-jammy-py3.14-clang12 / test (default, 4, 5, linux.4xlarge), trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 4, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…rgs (#168127) Fixes: #168124 This PR fixes triton compilation failures in combo kernels when combining multiple kernels with random ops (or any ops that creates args with value equal to 1). The fix adds the missing logic to populate the `constants` for args marked as compile-time constants, matching the behavior of regular Triton kernels. Pull Request resolved: #168127 Approved by: https://github.com/mlazos ghstack dependencies: #167781
This PR fixes two issues Fixes: pytorch#167780 combo_kernel fails with CppScheduling backend Fixes: pytorch#168067 combo_kernel fails with mixed cpu/cuda nodes Pull Request resolved: pytorch#167781 Approved by: https://github.com/mlazos
…rgs (pytorch#168127) Fixes: pytorch#168124 This PR fixes triton compilation failures in combo kernels when combining multiple kernels with random ops (or any ops that creates args with value equal to 1). The fix adds the missing logic to populate the `constants` for args marked as compile-time constants, matching the behavior of regular Triton kernels. Pull Request resolved: pytorch#168127 Approved by: https://github.com/mlazos ghstack dependencies: pytorch#167781
This PR fixes two issues Fixes: #167780 combo_kernel fails with CppScheduling backend Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes Pull Request resolved: #167781 Approved by: https://github.com/mlazos
…rgs (#168127) Fixes: #168124 This PR fixes triton compilation failures in combo kernels when combining multiple kernels with random ops (or any ops that creates args with value equal to 1). The fix adds the missing logic to populate the `constants` for args marked as compile-time constants, matching the behavior of regular Triton kernels. Pull Request resolved: #168127 Approved by: https://github.com/mlazos ghstack dependencies: #167781
ghstack-source-id: ba6c7d2 Pull Request resolved: pytorch/pytorch#167781
ghstack-source-id: 7156ce0 Pull Request resolved: pytorch/pytorch#167781
Stack from ghstack (oldest at bottom):
This PR fixes two issues
Fixes: #167780 combo_kernel fails with CppScheduling backend
Fixes: #168067 combo_kernel fails with mixed cpu/cuda nodes
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @mlazos @chenyang78