-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[PyTorch Distributed] Move NCCL_DEBUG print to after NCCL init #74287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 10289a9 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
This pull request was exported from Phabricator. Differential Revision: D34912492 |
awgu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved internally.
…ch#74287) Summary: Pull Request resolved: pytorch#74287 NCCL may read env vars from nccl.conf, it would be more accurate to get envs after NCCL inits Test Plan: unit test Reviewed By: ajauhri Differential Revision: D34912492 fbshipit-source-id: fe43bd66eec7bbec9987f77f68714a48e1309737
|
This pull request was exported from Phabricator. Differential Revision: D34912492 |
da9aa65 to
10289a9
Compare
| if (getRank() == 0) { | ||
| LOG(INFO) << "NCCL_DEBUG: " << parse_env("NCCL_DEBUG"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we print this on all ranks? The log above "ProcessGroupNCCL initialized with following options" is printed on all ranks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more for verbosity consideration - not sure if a user wants 1000 lines of NCCL_DEBUG=INFO printed in the log. The other options, by contrast, need to be consistent across all ranks, otherwise there would be a hang. So there may still a value in printing them on all ranks. Although, I am happy to simplify those as well :)
Summary: Pull Request resolved: #74287 NCCL may read env vars from nccl.conf, it would be more accurate to get envs after NCCL inits Test Plan: unit test Reviewed By: ajauhri Differential Revision: D34912492 fbshipit-source-id: d794935573a59fd25f8cbfde6b64978e063e4d7a
|
Hey @kwen2501. |
Summary:
NCCL may read env vars from nccl.conf, it would be more accurate to get
envs after NCCL inits
Test Plan: unit test
Differential Revision: D34912492