Skip to content

[xpu] Support high stream for ProcessGroupXCCL#163049

Closed
Chao1Han wants to merge 3 commits intopytorch:mainfrom
Chao1Han:high_stream
Closed

[xpu] Support high stream for ProcessGroupXCCL#163049
Chao1Han wants to merge 3 commits intopytorch:mainfrom
Chao1Han:high_stream

Conversation

@Chao1Han
Copy link
Contributor

@Chao1Han Chao1Han commented Sep 16, 2025

Add high priority stream support for ProcessGroupXCCL. Just like CUDA, XPU streams also support execution with higher priority compared to other streams. Implementation in intel/torch-xpu-ops#1715, add register here.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Sep 16, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 16, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163049

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 65d78d2 with merge base f2bb22f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Chao1Han Chao1Han marked this pull request as draft September 16, 2025 07:13
github-merge-queue bot pushed a commit to intel/torch-xpu-ops that referenced this pull request Sep 17, 2025
Support high priority stream for xccl, test case add in
#2049
We need merge this pr first and upstream op register
pytorch/pytorch#163049 and then test case could
be pass

---------

Co-authored-by: mengfei25 <mengfei.li@Intel.com>
@EikanWang EikanWang requested a review from Copilot September 17, 2025 01:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds high priority stream support for ProcessGroupXCCL, bringing it in line with CUDA's stream priority capabilities. The implementation enables XPU streams to execute with higher priority compared to other streams.

  • Adds a new constructor overload for ProcessGroupXCCL that accepts store, rank, and size parameters with default low priority stream configuration
  • Extends the Options class to include is_high_priority_stream parameter with proper Python bindings
  • Provides read/write access to the high priority stream option through Python properties

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

mengfei25 added a commit to intel/torch-xpu-ops that referenced this pull request Sep 17, 2025
Support high priority stream for xccl, test case add in
#2049
We need merge this pr first and upstream op register
pytorch/pytorch#163049 and then test case could
be pass

---------

Co-authored-by: mengfei25 <mengfei.li@Intel.com>
@guangyey
Copy link
Collaborator

@Chao1Han You need to update torch-xpu-ops as well.

@guangyey guangyey moved this to Review Required in PyTorch Intel Sep 18, 2025
@guangyey guangyey moved this from Review Required to Pre-Review Required in PyTorch Intel Sep 18, 2025
@Chao1Han
Copy link
Contributor Author

@Chao1Han You need to update torch-xpu-ops as well.

Sure, let me update pin commit also here.

@Chao1Han Chao1Han marked this pull request as ready for review September 18, 2025 03:15
@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Sep 18, 2025
Copy link
Collaborator

@guangyey guangyey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@guangyey guangyey added release notes: xpu release notes category ciflow/trunk Trigger trunk jobs on your pull request labels Sep 18, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 18, 2025

To add the ciflow label ciflow/trunk please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Sep 18, 2025
@guangyey
Copy link
Collaborator

I think you'd better update the pin in a separate PR.

@Chao1Han
Copy link
Contributor Author

I think you'd better update the pin in a separate PR.

Sure, I will wait for the pin commit update before merging this PR.

@pytorch-bot pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Sep 18, 2025
@Chao1Han
Copy link
Contributor Author

@pytorchbot rebase -b main

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 21, 2025

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Oct 21, 2025
@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Oct 21, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 21, 2025

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Oct 21, 2025
@guangyey guangyey added ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks labels Oct 21, 2025
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@Chao1Han
Copy link
Contributor Author

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-project-automation github-project-automation bot moved this from Review Required to Done in PyTorch Intel Oct 22, 2025
zhudada0120 pushed a commit to zhudada0120/pytorch that referenced this pull request Oct 22, 2025
Add high priority stream support for ProcessGroupXCCL. Just like CUDA, XPU streams also support execution with higher priority compared to other streams. Implementation in intel/torch-xpu-ops#1715, add register here.

Pull Request resolved: pytorch#163049
Approved by: https://github.com/guangyey, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD
github-merge-queue bot pushed a commit to intel/torch-xpu-ops that referenced this pull request Oct 27, 2025
Feature #1715 and register
pytorch/pytorch#163049 merged. add some high
priority stream test case
pytorchmergebot pushed a commit that referenced this pull request Oct 30, 2025
After #163049, this PR fixes the type annotations to match the actual implementation for ProcessGroupXCCL::Options.
Pull Request resolved: #166418
Approved by: https://github.com/guangyey, https://github.com/ezyang
BoyuanFeng pushed a commit that referenced this pull request Oct 31, 2025
After #163049, this PR fixes the type annotations to match the actual implementation for ProcessGroupXCCL::Options.
Pull Request resolved: #166418
Approved by: https://github.com/guangyey, https://github.com/ezyang
etaf pushed a commit to etaf/pytorch-inductor-xpu that referenced this pull request Nov 4, 2025
After pytorch#163049, this PR fixes the type annotations to match the actual implementation for ProcessGroupXCCL::Options.
Pull Request resolved: pytorch#166418
Approved by: https://github.com/guangyey, https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: distributed (c10d) release notes category release notes: distributed (checkpoint) release notes: xpu release notes category

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants