-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[NCCL] Helper Function to Abort All outstanding NCCL Communicators #40945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds a helper function to abort all NCCL Communicators associated with a WorkNCCL object Differential Revision: [D22127899](https://our.internmc.facebook.com/intern/diff/D22127899/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 4af9f51 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 14 times. |
|
Wondering if this helper function is being used in this stack |
It was used in a prior version of #40946, but not anymore. Previously, we weren't overwriting the |
…nicators" This adds a helper function to abort all NCCL Communicators associated with a WorkNCCL object Differential Revision: [D22127899](https://our.internmc.facebook.com/intern/diff/D22127899/) [ghstack-poisoned]
…nicators" This adds a helper function to abort all NCCL Communicators associated with a WorkNCCL object Differential Revision: [D22127899](https://our.internmc.facebook.com/intern/diff/D22127899/) [ghstack-poisoned]
|
If this is not used, shall we move this PR out of this stack and only land it when necessary? |
…nicators" This adds a helper function to abort all NCCL Communicators associated with a WorkNCCL object Differential Revision: [D22127899](https://our.internmc.facebook.com/intern/diff/D22127899/) [ghstack-poisoned]
|
Closing this PR since aborting NCCL Comms from the WorkNCCL-level is no longer necessary given the updated design for work timeouts |
Stack from ghstack:
This adds a helper function to abort all NCCL Communicators associated with a WorkNCCL object
Differential Revision: D22127899