Skip to content

[c10d] Fix split_group bug by having the parent pg option deep copied#167125

Closed
fduwjj wants to merge 1 commit intopytorch:mainfrom
fduwjj:export-D86225394
Closed

[c10d] Fix split_group bug by having the parent pg option deep copied#167125
fduwjj wants to merge 1 commit intopytorch:mainfrom
fduwjj:export-D86225394

Conversation

@fduwjj
Copy link
Contributor

@fduwjj fduwjj commented Nov 5, 2025

Summary: Inside group_split api, we share the reference of PG option with parent PG if a PG option is not explicitly specified. This is bad because if we split parent pg multiple times, we will run into errors.

Test Plan: UT + internal test.

Differential Revision: D86225394

cc @H-Huang @awgu @wanchaol @fegin @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167125

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e746d51 with merge base 13d2cc7 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Nov 5, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 5, 2025

@fduwjj has exported this pull request. If you are a Meta employee, you can view the originating Diff in D86225394.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 5, 2025
…pytorch#167125)

Summary:

Inside group_split api, we share the reference of PG option with parent PG if a PG option is not explicitly specified. This is bad because if we split parent pg multiple times, we will run into errors.

Test Plan: UT + internal test.

Reviewed By: MogicianWu

Differential Revision: D86225394
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged meta-exported oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants