[Triton] [Inductor] Pruned failed compilations from Autotuning candidates by njriasan · Pull Request #162673 · pytorch/pytorch

njriasan · 2025-09-11T00:43:45Z

Summary:
When exahaustively autotuning a new template you may hit situations that lead to compilation failures. This template will still attempt to autotune because nothing was marking this as failed and in my experiments lead to a crash/segfault if I didn't set TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1.

To help eliminate this issue this PR marks any template that fails to compile as "failed" and then removes all of the failed templates from the choice candidates. In the case where it would have just failed to compile twice, this should at least reduce compilation time.

Test Plan:
Tested locally when experminenting with the new blackwell templates and a Triton version that contains a bug related to num_warps < 4.

Rollback Plan:

Differential Revision: D82172207

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos

pytorch-bot · 2025-09-11T00:43:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162673

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 28907b4 with merge base f654cff ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / unit-test / inductor-cpu-test / test (inductor_amx, 1, 2, linux.8xlarge.amx) (gh) (trunk failure)
distributed/test_dynamo_distributed.py::TestFakeDistributedSingleProc::test_hf_bert_ddp_aot_eager
inductor / unit-test / inductor-cpu-test / test (inductor_avx2, 1, 2, linux.10xlarge.avx2) (gh) (trunk failure)
distributed/test_dynamo_distributed.py::TestFakeDistributedSingleProc::test_hf_bert_ddp_aot_eager

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-09-11T00:44:02Z

This pull request was exported from Phabricator. Differential Revision: D82172207

…ates (pytorch#162673) Summary: When exahaustively autotuning a new template you may hit situations that lead to compilation failures. This template will still attempt to autotune because nothing was marking this as failed and in my experiments lead to a crash/segfault if I didn't set `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1`. To help eliminate this issue this PR marks any template that fails to compile as "failed" and then removes all of the failed templates from the choice candidates. In the case where it would have just failed to compile twice, this should at least reduce compilation time. Test Plan: Tested locally when experminenting with the new blackwell templates and a Triton version that contains a bug related to `num_warps < 4`. Rollback Plan: Differential Revision: D82172207

facebook-github-bot · 2025-09-11T02:17:58Z

This pull request was exported from Phabricator. Differential Revision: D82172207

facebook-github-bot · 2025-09-11T02:18:04Z

This pull request was exported from Phabricator. Differential Revision: D82172207

PaulZhang12

This is an awesome fix! Thank you, cc @eellison for any thoughts

njriasan · 2025-09-11T17:33:15Z

@pytorchbot merge

pytorchmergebot · 2025-09-11T17:35:02Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

mlazos · 2025-09-11T21:16:34Z

torch/_inductor/select_algorithm.py

                else "max_autotune_conv_backends"
            )
-            raise NoValidChoicesError(
+            return NoValidChoicesError(


It might be good to modify this to indicate the reason for no valid choices, I think this could be a good idea (e.g. no compileable choices vs no choices at the beginning)

I had some plans to update this since this is the most common error I've seen with users by far. They usually end up adding aten, but it would be useful to know why.

Happy to submit a followup. Thanks for the suggestion.

…ates (pytorch#162673) Summary: When exahaustively autotuning a new template you may hit situations that lead to compilation failures. This template will still attempt to autotune because nothing was marking this as failed and in my experiments lead to a crash/segfault if I didn't set `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1`. To help eliminate this issue this PR marks any template that fails to compile as "failed" and then removes all of the failed templates from the choice candidates. In the case where it would have just failed to compile twice, this should at least reduce compilation time. Test Plan: Tested locally when experminenting with the new blackwell templates and a Triton version that contains a bug related to `num_warps < 4`. Rollback Plan: Differential Revision: D82172207 Pull Request resolved: pytorch#162673 Approved by: https://github.com/PaulZhang12, https://github.com/mlazos

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 11, 2025

facebook-github-bot added the fb-exported label Sep 11, 2025

njriasan requested review from PaulZhang12 and eellison September 11, 2025 00:44

njriasan added topic: improvements topic category release notes: inductor labels Sep 11, 2025

njriasan force-pushed the export-D82172207 branch from fd45ee6 to aa49bfd Compare September 11, 2025 02:17

njriasan force-pushed the export-D82172207 branch from aa49bfd to 28907b4 Compare September 11, 2025 02:17

PaulZhang12 approved these changes Sep 11, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 11, 2025

pytorchmergebot added the merging label Sep 11, 2025

mlazos reviewed Sep 11, 2025

View reviewed changes

mlazos approved these changes Sep 11, 2025

View reviewed changes

pytorchmergebot added the Merged label Sep 11, 2025

pytorchmergebot closed this in 9614c2e Sep 11, 2025

pytorchmergebot removed the merging label Sep 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Triton] [Inductor] Pruned failed compilations from Autotuning candidates#162673

[Triton] [Inductor] Pruned failed compilations from Autotuning candidates#162673
njriasan wants to merge 1 commit intopytorch:mainfrom
njriasan:export-D82172207

njriasan commented Sep 11, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Sep 11, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

PaulZhang12 left a comment

Uh oh!

njriasan commented Sep 11, 2025

Uh oh!

pytorchmergebot commented Sep 11, 2025

Uh oh!

mlazos Sep 11, 2025

Uh oh!

njriasan Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

njriasan commented Sep 11, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162673

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

facebook-github-bot commented Sep 11, 2025

Uh oh!

PaulZhang12 left a comment

Choose a reason for hiding this comment

Uh oh!

njriasan commented Sep 11, 2025

Uh oh!

pytorchmergebot commented Sep 11, 2025

Merge started

Uh oh!

mlazos Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

njriasan Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

njriasan commented Sep 11, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 11, 2025 •

edited

Loading