Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166182
Note: Links to docs will display an error until the docs builds have been completed. ❗ 2 Active SEVsThere are 2 currently active SEVs. If your PR is affected, please view them below:
✅ You can merge normally! (1 Unrelated Failure)As of commit 588f15c with merge base 0b68814 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
fduwjj
left a comment
There was a problem hiding this comment.
I think this change looks weird to me. Any chance we can do that inside torchft?
|
@fduwjj can't really, the name is set at init time. FR also takes it at init time. The change is backward compatible, so it shouldn't really cause any issues i.e. the code without the |
|
@tushar00jain I mean after you call init_process_group, you can always change pg name right? |
|
@fduwjj it's tricky and i'm not sure if that's a good idea. some places use the name inside the init, FR is one example. if those places use the name and we change the name later, it doesn't really help |
6e64b1b to
9c6b5ee
Compare
Summary: - in torchft we have multiple default pg's, 1 for each task group - for flight recorder to work, each of these need to have a different name, so entries can be matched - change the `init_process_group` api to optionally take a list of ranks. if provided, we use the hash of the ranks as the name of the pg. for torchft, we'll pass global ranks here so the default pg have a different name on each task group
9c6b5ee to
588f15c
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: - in torchft we have multiple default pg's, 1 for each task group - for flight recorder to work, each of these need to have a different name, so entries can be matched - change the `init_process_group` api to optionally take a list of ranks. if provided, we use the hash of the ranks as the name of the pg. for torchft, we'll pass global ranks here so the default pg have a different name on each task group Pull Request resolved: #166182 Approved by: https://github.com/fduwjj
Summary:
init_process_groupapi to optionally take a list of ranks. if provided, we use the hash of the ranks as the name of the pg. for torchft, we'll pass global ranks here so the default pg have a different name on each task groupcc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci