[inductor] Reduce cold compilation time caused by duplicated user-defined Triton kernels by desertfire · Pull Request #168292 · pytorch/pytorch

desertfire · 2025-11-20T19:30:14Z

Stack from ghstack (oldest at bottom):

-> [inductor] Reduce cold compilation time caused by duplicated user-defined Triton kernels #168292
[inductor] Fix a user-defined Triton kernel output + .cpu() correctness issue #168281

Summary: Similar to #167132, but the previous PR didn't consider user-defined Triton kernels. When cudagraphs-partition is enabled in Inductor, different partitions can use the same user-defined Triton kernels. Each user-defined Trition kernel should only be defined and compiled once.

Local measure shoes this PR can reduce Qwen/Qwen3-VL-235B-A22B-Instruct's cold compilation time from 243.65s to 114.69s.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos

…ined Triton kernels Summary: Similar to #167132, but the previous PR didn't consider user-defined Triton kernels. When cudagraphs-partition is enabled in Inductor, different partitions can use the same user-defined Triton kernels. Each user-defined Trition kernel should only be defined and compiled once. Local measure shoes this PR can reduce Qwen/Qwen3-VL-235B-A22B-Instruct's cold compilation time from 243.65s to 114.69s. [ghstack-poisoned]

…ined Triton kernels Summary: Similar to #167132, but the previous PR didn't consider user-defined Triton kernels. When cudagraphs-partition is enabled in Inductor, different partitions can use the same user-defined Triton kernels. Each user-defined Trition kernel should only be defined and compiled once. Local measure shoes this PR can reduce Qwen/Qwen3-VL-235B-A22B-Instruct's cold compilation time from 243.65s to 114.69s. ghstack-source-id: 5556193 Pull Request resolved: #168292

pytorch-bot · 2025-11-20T20:02:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/168292

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ed user-defined Triton kernels" Summary: Similar to #167132, but the previous PR didn't consider user-defined Triton kernels. When cudagraphs-partition is enabled in Inductor, different partitions can use the same user-defined Triton kernels. Each user-defined Trition kernel should only be defined and compiled once. Local measure shoes this PR can reduce Qwen/Qwen3-VL-235B-A22B-Instruct's cold compilation time from 243.65s to 114.69s. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben mlazos [ghstack-poisoned]

…ined Triton kernels Summary: Similar to #167132, but the previous PR didn't consider user-defined Triton kernels. When cudagraphs-partition is enabled in Inductor, different partitions can use the same user-defined Triton kernels. Each user-defined Trition kernel should only be defined and compiled once. Local measure shoes this PR can reduce Qwen/Qwen3-VL-235B-A22B-Instruct's cold compilation time from 243.65s to 114.69s. ghstack-source-id: 2ffe739 Pull Request resolved: #168292

desertfire · 2025-11-21T22:40:36Z

@pytorchbot merge

pytorchmergebot · 2025-11-21T22:42:26Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ined Triton kernels (#168292) Summary: Similar to #167132, but the previous PR didn't consider user-defined Triton kernels. When cudagraphs-partition is enabled in Inductor, different partitions can use the same user-defined Triton kernels. Each user-defined Trition kernel should only be defined and compiled once. Local measure shoes this PR can reduce Qwen/Qwen3-VL-235B-A22B-Instruct's cold compilation time from 243.65s to 114.69s. Pull Request resolved: #168292 Approved by: https://github.com/eellison ghstack dependencies: #168281

desertfire mentioned this pull request Nov 20, 2025

[inductor] Fix a user-defined Triton kernel output + .cpu() correctness issue #168281

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 20, 2025

desertfire added the release notes: inductor label Nov 20, 2025

desertfire requested review from BoyuanFeng and eellison November 21, 2025 13:02

eellison approved these changes Nov 21, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 21, 2025

pytorchmergebot added the merging label Nov 21, 2025

pytorchmergebot added the Merged label Nov 22, 2025

pytorchmergebot closed this in 7ec5c16 Nov 22, 2025

pytorchmergebot removed the merging label Nov 22, 2025

github-actions bot deleted the gh/desertfire/611/head branch December 22, 2025 02:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor] Reduce cold compilation time caused by duplicated user-defined Triton kernels#168292

[inductor] Reduce cold compilation time caused by duplicated user-defined Triton kernels#168292
desertfire wants to merge 2 commits intogh/desertfire/611/basefrom
gh/desertfire/611/head

desertfire commented Nov 20, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

desertfire commented Nov 21, 2025

Uh oh!

pytorchmergebot commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

desertfire commented Nov 20, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/168292

Uh oh!

desertfire commented Nov 21, 2025

Uh oh!

pytorchmergebot commented Nov 21, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

desertfire commented Nov 20, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 20, 2025 •

edited

Loading