Skip to content

[XPU][Fix] Register convolution_overrideable for flops count#166839

Closed
Stonepia wants to merge 4 commits intopytorch:mainfrom
Stonepia:tong/conv_flop
Closed

[XPU][Fix] Register convolution_overrideable for flops count#166839
Stonepia wants to merge 4 commits intopytorch:mainfrom
Stonepia:tong/conv_flop

Conversation

@Stonepia
Copy link
Contributor

@Stonepia Stonepia commented Nov 3, 2025

Fixes #166838

  1. Register convolution_overrideable key for flop_counter. CUDA relies on keys with cudnn_convolution. For devices like XPU, it falls to convolution_overrideable. Without the correct registration, the flop_couter will silently return 0 for XPU in line:

    if op_obj is None or op_obj not in flop_registry:
    return 0

  2. Enable the tests when enabling the XPU on test_analysis.py.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @gujinghui @fengyuan14 @guangyey

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166839

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 4ef8cb5 with merge base 392acee (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Stonepia
Copy link
Contributor Author

Stonepia commented Nov 3, 2025

@pytorchbot label "module: xpu"

@pytorch-bot pytorch-bot bot added the module: xpu Intel XPU related issues label Nov 3, 2025
@Stonepia Stonepia changed the title [XPU][Bug] Register convolution_overrideable for flops count [XPU][Fix] Register convolution_overrideable for flops count Nov 3, 2025
@Stonepia
Copy link
Contributor Author

Stonepia commented Nov 3, 2025

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Nov 3, 2025
@Stonepia Stonepia marked this pull request as ready for review November 3, 2025 06:56
"Requires XPU or CUDA SM80",
)
@skipXPUIf(TEST_WITH_SLOW, "Skip because test too slow on XPU")
@dtypes(torch.float, torch.float16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add convolution_overrideable in line 479?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the logic already have the or statements, so we don't need to explicitly add it. But yeah, adding it seems more readable.

 if name.startswith(
                    (
                        "aten::cudnn_convolution",
                        "aten::convolution",
                        "aten::_convolution",
                    )
                )
or "conv" in name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in f583ecf

@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Nov 3, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 3, 2025

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Nov 3, 2025
@guangyey guangyey added ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks labels Nov 3, 2025
@pytorch-bot pytorch-bot bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor ciflow/xpu Run XPU CI tasks labels Nov 3, 2025
@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Nov 3, 2025
@guangyey guangyey requested a review from jansel November 3, 2025 08:13
@guangyey guangyey added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 3, 2025
@guangyey
Copy link
Collaborator

guangyey commented Nov 4, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@Stonepia Stonepia deleted the tong/conv_flop branch November 4, 2025 05:58
pytorch-bot bot pushed a commit that referenced this pull request Nov 4, 2025
Fixes #166838
1. Register `convolution_overrideable` key for flop_counter. CUDA relies on keys with `cudnn_convolution`. For devices like `XPU`, it falls to `convolution_overrideable`. Without the correct registration, the flop_couter will silently return 0 for XPU in line:
https://github.com/pytorch/pytorch/blob/e1d011d6eb571cd98ec7c7ed8e8b518a5463ec97/torch/_inductor/analysis/profile_analysis.py#L178-L179

2. Enable the tests when enabling the XPU on `test_analysis.py`.

Pull Request resolved: #166839
Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/jansel
pytorchmergebot pushed a commit that referenced this pull request Nov 11, 2025
This PR enables XPU devices in test_analysis.py.

For performance reason, it skips some slow tests, so a full scope should be enabled by using:

```
export PYTORCH_TEST_WTH_SLOW=1
```

**PR Stack:**

- #166840 : This PR enables the tests, ignores the tests that failed
- #166839 : This fixed the bug and enable the full tests for xpu

**Some skipped test time:**

```
test_augment_trace_against_flop_counter_maxat0_xpu_float16 [49.0863s]
test_augment_trace_against_flop_counter_maxat0_xpu_float32 [18.2268s]
test_augment_trace_against_flop_counter_maxat1_xpu_float16 [85.6549s]
test_augment_trace_against_flop_counter_maxat1_xpu_float32 [329.0832s]
test_augment_trace_against_flop_counter_maxat2_xpu_float16 [24.4825s]
test_augment_trace_against_flop_counter_maxat2_xpu_float32 [19.0688s]
```

Pull Request resolved: #166840
Approved by: https://github.com/guangyey, https://github.com/jansel
Silv3S pushed a commit to Silv3S/pytorch that referenced this pull request Nov 18, 2025
…6840)

This PR enables XPU devices in test_analysis.py.

For performance reason, it skips some slow tests, so a full scope should be enabled by using:

```
export PYTORCH_TEST_WTH_SLOW=1
```

**PR Stack:**

- pytorch#166840 : This PR enables the tests, ignores the tests that failed
- pytorch#166839 : This fixed the bug and enable the full tests for xpu

**Some skipped test time:**

```
test_augment_trace_against_flop_counter_maxat0_xpu_float16 [49.0863s]
test_augment_trace_against_flop_counter_maxat0_xpu_float32 [18.2268s]
test_augment_trace_against_flop_counter_maxat1_xpu_float16 [85.6549s]
test_augment_trace_against_flop_counter_maxat1_xpu_float32 [329.0832s]
test_augment_trace_against_flop_counter_maxat2_xpu_float16 [24.4825s]
test_augment_trace_against_flop_counter_maxat2_xpu_float32 [19.0688s]
```

Pull Request resolved: pytorch#166840
Approved by: https://github.com/guangyey, https://github.com/jansel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged module: inductor module: xpu Intel XPU related issues open source topic: not user facing topic category

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[XPU] Flops count for xpu models is not correct

6 participants