-
Notifications
You must be signed in to change notification settings - Fork 26.3k
BAND, BOR and BXOR for NCCL (all_)reduce should throw runtime errors #42669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BAND, BOR and BXOR for NCCL (all_)reduce should throw runtime errors #42669
Conversation
💊 CI failures summary and remediationsAs of commit 1130ac4 (more details on the Dr. CI page):
ci.pytorch.org: 2 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 7 times. |
rohan-varma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, it looks great! Requesting changes for the assertRaisesRegex, and there are also a couple of python lint issues. You can view those by clicking on the failed CI job (we should have inline annotations as well, but those seem to not be working right now).
rohan-varma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @thinking-tower!
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohan-varma has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@rohan-varma merged this pull request in 6ebc050. |
cc @rohan-varma
Fixes #41362 #39708
Description
NCCL doesn't support
BAND, BOR, BXOR. Since the current mapping doesn't contain any of the mentioned bitwise operator, a default value ofncclSumis used instead.This PR should provide the expected behaviour where a runtime exception is thrown.
Notes