[Triton] [Inductor] Pruned failed compilations from Autotuning candidates#162673
[Triton] [Inductor] Pruned failed compilations from Autotuning candidates#162673njriasan wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162673
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 28907b4 with merge base f654cff ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D82172207 |
fd45ee6 to
aa49bfd
Compare
…ates (pytorch#162673) Summary: When exahaustively autotuning a new template you may hit situations that lead to compilation failures. This template will still attempt to autotune because nothing was marking this as failed and in my experiments lead to a crash/segfault if I didn't set `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1`. To help eliminate this issue this PR marks any template that fails to compile as "failed" and then removes all of the failed templates from the choice candidates. In the case where it would have just failed to compile twice, this should at least reduce compilation time. Test Plan: Tested locally when experminenting with the new blackwell templates and a Triton version that contains a bug related to `num_warps < 4`. Rollback Plan: Differential Revision: D82172207
…ates (pytorch#162673) Summary: When exahaustively autotuning a new template you may hit situations that lead to compilation failures. This template will still attempt to autotune because nothing was marking this as failed and in my experiments lead to a crash/segfault if I didn't set `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1`. To help eliminate this issue this PR marks any template that fails to compile as "failed" and then removes all of the failed templates from the choice candidates. In the case where it would have just failed to compile twice, this should at least reduce compilation time. Test Plan: Tested locally when experminenting with the new blackwell templates and a Triton version that contains a bug related to `num_warps < 4`. Rollback Plan: Differential Revision: D82172207
aa49bfd to
28907b4
Compare
|
This pull request was exported from Phabricator. Differential Revision: D82172207 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D82172207 |
PaulZhang12
left a comment
There was a problem hiding this comment.
This is an awesome fix! Thank you, cc @eellison for any thoughts
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
| else "max_autotune_conv_backends" | ||
| ) | ||
| raise NoValidChoicesError( | ||
| return NoValidChoicesError( |
There was a problem hiding this comment.
It might be good to modify this to indicate the reason for no valid choices, I think this could be a good idea (e.g. no compileable choices vs no choices at the beginning)
I had some plans to update this since this is the most common error I've seen with users by far. They usually end up adding aten, but it would be useful to know why.
There was a problem hiding this comment.
Happy to submit a followup. Thanks for the suggestion.
…ates (pytorch#162673) Summary: When exahaustively autotuning a new template you may hit situations that lead to compilation failures. This template will still attempt to autotune because nothing was marking this as failed and in my experiments lead to a crash/segfault if I didn't set `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1`. To help eliminate this issue this PR marks any template that fails to compile as "failed" and then removes all of the failed templates from the choice candidates. In the case where it would have just failed to compile twice, this should at least reduce compilation time. Test Plan: Tested locally when experminenting with the new blackwell templates and a Triton version that contains a bug related to `num_warps < 4`. Rollback Plan: Differential Revision: D82172207 Pull Request resolved: pytorch#162673 Approved by: https://github.com/PaulZhang12, https://github.com/mlazos
…ates (pytorch#162673) Summary: When exahaustively autotuning a new template you may hit situations that lead to compilation failures. This template will still attempt to autotune because nothing was marking this as failed and in my experiments lead to a crash/segfault if I didn't set `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1`. To help eliminate this issue this PR marks any template that fails to compile as "failed" and then removes all of the failed templates from the choice candidates. In the case where it would have just failed to compile twice, this should at least reduce compilation time. Test Plan: Tested locally when experminenting with the new blackwell templates and a Triton version that contains a bug related to `num_warps < 4`. Rollback Plan: Differential Revision: D82172207 Pull Request resolved: pytorch#162673 Approved by: https://github.com/PaulZhang12, https://github.com/mlazos
…ates (pytorch#162673) Summary: When exahaustively autotuning a new template you may hit situations that lead to compilation failures. This template will still attempt to autotune because nothing was marking this as failed and in my experiments lead to a crash/segfault if I didn't set `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1`. To help eliminate this issue this PR marks any template that fails to compile as "failed" and then removes all of the failed templates from the choice candidates. In the case where it would have just failed to compile twice, this should at least reduce compilation time. Test Plan: Tested locally when experminenting with the new blackwell templates and a Triton version that contains a bug related to `num_warps < 4`. Rollback Plan: Differential Revision: D82172207 Pull Request resolved: pytorch#162673 Approved by: https://github.com/PaulZhang12, https://github.com/mlazos
…ates (pytorch#162673) Summary: When exahaustively autotuning a new template you may hit situations that lead to compilation failures. This template will still attempt to autotune because nothing was marking this as failed and in my experiments lead to a crash/segfault if I didn't set `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1`. To help eliminate this issue this PR marks any template that fails to compile as "failed" and then removes all of the failed templates from the choice candidates. In the case where it would have just failed to compile twice, this should at least reduce compilation time. Test Plan: Tested locally when experminenting with the new blackwell templates and a Triton version that contains a bug related to `num_warps < 4`. Rollback Plan: Differential Revision: D82172207 Pull Request resolved: pytorch#162673 Approved by: https://github.com/PaulZhang12, https://github.com/mlazos
Summary:
When exahaustively autotuning a new template you may hit situations that lead to compilation failures. This template will still attempt to autotune because nothing was marking this as failed and in my experiments lead to a crash/segfault if I didn't set
TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1.To help eliminate this issue this PR marks any template that fails to compile as "failed" and then removes all of the failed templates from the choice candidates. In the case where it would have just failed to compile twice, this should at least reduce compilation time.
Test Plan:
Tested locally when experminenting with the new blackwell templates and a Triton version that contains a bug related to
num_warps < 4.Rollback Plan:
Differential Revision: D82172207
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos