[inductor] fix issue for example value with unbacked strides#163660
[inductor] fix issue for example value with unbacked strides#163660sevenEng wants to merge 2 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163660
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 0d8f1f5 with merge base 3a110c9 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Dumping some thoughts here, since this would be used for torch.compile GEMM autotuning too:
Why do I care about pytorch/torch/_inductor/sizevars.py Lines 750 to 769 in 1495b35 I believe the answer to both (1) and (2) is Yes. AFAIK everytime we run Inductor, we need to create a brand new pytorch/torch/_inductor/compile_fx.py Line 1391 in 1495b35 When we create pytorch/torch/_inductor/graph.py Line 375 in 1495b35 |
865aca9 to
5e0f4ac
Compare
| V.graph.sizevars.size_hints( | ||
| node.get_stride(), | ||
| fallback=config.unbacked_symint_fallback, | ||
| hint_override=hint_override, |
There was a problem hiding this comment.
Yeah, we should definitely keep hint_override here.
We should either make atomically_apply_size_hint available/an option in size_hints or the other way around.
cc: @bobrenjc93 @pianpwk @ezyang for any thoughts
84b80e3 to
0d8f1f5
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…#163660) ## Issue During autotune, we're not applying size hints atomically for the example inputs used for benchmarking. If there is unbacked symint showing up in inputs' strides, this might lead to CUDA IMA, and this could be reproduced by the added unittest, with stride being `[128 * u0, 128, 1]` and unbacked fallback being 8192, after calling `benchmark_example_value`, we get back a tensor with stride as `[8192, 128, 1]` as opposed to `[128 * 8192, 128, 1]` ## Fix Using the atomic API when trying to apply size hints to input tensor' strides. Pull Request resolved: pytorch#163660 Approved by: https://github.com/ColinPeppler
…#163660) ## Issue During autotune, we're not applying size hints atomically for the example inputs used for benchmarking. If there is unbacked symint showing up in inputs' strides, this might lead to CUDA IMA, and this could be reproduced by the added unittest, with stride being `[128 * u0, 128, 1]` and unbacked fallback being 8192, after calling `benchmark_example_value`, we get back a tensor with stride as `[8192, 128, 1]` as opposed to `[128 * 8192, 128, 1]` ## Fix Using the atomic API when trying to apply size hints to input tensor' strides. Pull Request resolved: pytorch#163660 Approved by: https://github.com/ColinPeppler
Issue
During autotune, we're not applying size hints atomically for the example inputs used for benchmarking.
If there is unbacked symint showing up in inputs' strides, this might lead to CUDA IMA,
and this could be reproduced by the added unittest, with stride being
[128 * u0, 128, 1]and unbacked fallback being 8192, after callingbenchmark_example_value, we get back a tensor with stride as[8192, 128, 1]as opposed to[128 * 8192, 128, 1]Fix
Using the atomic API when trying to apply size hints to input tensor' strides.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben