[XPU][Fix] Register convolution_overrideable for flops count#166839
[XPU][Fix] Register convolution_overrideable for flops count#166839Stonepia wants to merge 4 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166839
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 4ef8cb5 with merge base 392acee ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "module: xpu" |
|
@pytorchbot label "topic: not user facing" |
| "Requires XPU or CUDA SM80", | ||
| ) | ||
| @skipXPUIf(TEST_WITH_SLOW, "Skip because test too slow on XPU") | ||
| @dtypes(torch.float, torch.float16) |
There was a problem hiding this comment.
Should we add convolution_overrideable in line 479?
There was a problem hiding this comment.
Well, the logic already have the or statements, so we don't need to explicitly add it. But yeah, adding it seems more readable.
if name.startswith(
(
"aten::cudnn_convolution",
"aten::convolution",
"aten::_convolution",
)
)
or "conv" in name|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Fixes #166838 1. Register `convolution_overrideable` key for flop_counter. CUDA relies on keys with `cudnn_convolution`. For devices like `XPU`, it falls to `convolution_overrideable`. Without the correct registration, the flop_couter will silently return 0 for XPU in line: https://github.com/pytorch/pytorch/blob/e1d011d6eb571cd98ec7c7ed8e8b518a5463ec97/torch/_inductor/analysis/profile_analysis.py#L178-L179 2. Enable the tests when enabling the XPU on `test_analysis.py`. Pull Request resolved: #166839 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/jansel
This PR enables XPU devices in test_analysis.py. For performance reason, it skips some slow tests, so a full scope should be enabled by using: ``` export PYTORCH_TEST_WTH_SLOW=1 ``` **PR Stack:** - #166840 : This PR enables the tests, ignores the tests that failed - #166839 : This fixed the bug and enable the full tests for xpu **Some skipped test time:** ``` test_augment_trace_against_flop_counter_maxat0_xpu_float16 [49.0863s] test_augment_trace_against_flop_counter_maxat0_xpu_float32 [18.2268s] test_augment_trace_against_flop_counter_maxat1_xpu_float16 [85.6549s] test_augment_trace_against_flop_counter_maxat1_xpu_float32 [329.0832s] test_augment_trace_against_flop_counter_maxat2_xpu_float16 [24.4825s] test_augment_trace_against_flop_counter_maxat2_xpu_float32 [19.0688s] ``` Pull Request resolved: #166840 Approved by: https://github.com/guangyey, https://github.com/jansel
…6840) This PR enables XPU devices in test_analysis.py. For performance reason, it skips some slow tests, so a full scope should be enabled by using: ``` export PYTORCH_TEST_WTH_SLOW=1 ``` **PR Stack:** - pytorch#166840 : This PR enables the tests, ignores the tests that failed - pytorch#166839 : This fixed the bug and enable the full tests for xpu **Some skipped test time:** ``` test_augment_trace_against_flop_counter_maxat0_xpu_float16 [49.0863s] test_augment_trace_against_flop_counter_maxat0_xpu_float32 [18.2268s] test_augment_trace_against_flop_counter_maxat1_xpu_float16 [85.6549s] test_augment_trace_against_flop_counter_maxat1_xpu_float32 [329.0832s] test_augment_trace_against_flop_counter_maxat2_xpu_float16 [24.4825s] test_augment_trace_against_flop_counter_maxat2_xpu_float32 [19.0688s] ``` Pull Request resolved: pytorch#166840 Approved by: https://github.com/guangyey, https://github.com/jansel
Fixes #166838
Register
convolution_overrideablekey for flop_counter. CUDA relies on keys withcudnn_convolution. For devices likeXPU, it falls toconvolution_overrideable. Without the correct registration, the flop_couter will silently return 0 for XPU in line:pytorch/torch/_inductor/analysis/profile_analysis.py
Lines 178 to 179 in e1d011d
Enable the tests when enabling the XPU on
test_analysis.py.cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @gujinghui @fengyuan14 @guangyey