-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[NCCL] Add timeout to ProcessGroup Work Wait #40944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack adds Work-level timeout for blocking wait. This PR just changes the API to accept a default wait arg for the wait function in each ProcessGroup backend. The ProcessGroup superclass correctly waits for the given timeout by changing the CV wait to wait_for. Closes: #37571 Differential Revision: [D22107135](https://our.internmc.facebook.com/intern/diff/D22107135/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22107135/)! [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit c4d4ddd (more details on the Dr. CI page):
🚧 3 fixed upstream failures:These were probably caused by upstream breakages that were already fixed.
Please rebase on the
|
This stack adds Work-level timeout for blocking wait. This PR just changes the API to accept a default wait arg for the wait function in each ProcessGroup backend. The ProcessGroup superclass correctly waits for the given timeout by changing the CV wait to wait_for. Closes: #37571 Differential Revision: [D22107135](https://our.internmc.facebook.com/intern/diff/D22107135/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22107135/)! [ghstack-poisoned]
This stack adds Work-level timeout for blocking wait. This PR just changes the API to accept a default wait arg for the wait function in each ProcessGroup backend. The ProcessGroup superclass correctly waits for the given timeout by changing the CV wait to wait_for. Closes: #37571 Differential Revision: [D22107135](https://our.internmc.facebook.com/intern/diff/D22107135/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22107135/)! [ghstack-poisoned]
This stack adds Work-level timeout for blocking wait. This PR just changes the API to accept a default wait arg for the wait function in each ProcessGroup backend. The ProcessGroup superclass correctly waits for the given timeout by changing the CV wait to wait_for. Closes: #37571 Differential Revision: [D22107135](https://our.internmc.facebook.com/intern/diff/D22107135/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22107135/)! [ghstack-poisoned]
This stack adds Work-level timeout for blocking wait. This PR just changes the API to accept a default wait arg for the wait function in each ProcessGroup backend. The ProcessGroup superclass correctly waits for the given timeout by changing the CV wait to wait_for. Closes: #37571 Differential Revision: [D22107135](https://our.internmc.facebook.com/intern/diff/D22107135/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22107135/)! [ghstack-poisoned]
This stack adds Work-level timeout for blocking wait. This PR just changes the API to accept a default wait arg for the wait function in each ProcessGroup backend. The ProcessGroup superclass correctly waits for the given timeout by changing the CV wait to wait_for. Closes: #37571 Differential Revision: [D22107135](https://our.internmc.facebook.com/intern/diff/D22107135/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22107135/)! [ghstack-poisoned]
This stack adds Work-level timeout for blocking wait. This PR just changes the API to accept a default wait arg for the wait function in each ProcessGroup backend. The ProcessGroup superclass correctly waits for the given timeout by changing the CV wait to wait_for. Closes: #37571 Differential Revision: [D22107135](https://our.internmc.facebook.com/intern/diff/D22107135/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22107135/)! [ghstack-poisoned]
|
This pull request has been merged in 9d92fa2. |
Stack from ghstack:
This stack adds Work-level timeout for blocking wait.
This PR just changes the API to accept a default wait arg for the wait function in each ProcessGroup backend. The ProcessGroup superclass correctly waits for the given timeout by changing the CV wait to wait_for.
Closes: #37571
Differential Revision: D22107135
NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!