Skip to content

Conversation

@thinking-tower
Copy link
Contributor

@thinking-tower thinking-tower commented Aug 6, 2020

cc @rohan-varma
Fixes #41362 #39708

Description

NCCL doesn't support BAND, BOR, BXOR. Since the current mapping doesn't contain any of the mentioned bitwise operator, a default value of ncclSum is used instead.

This PR should provide the expected behaviour where a runtime exception is thrown.

Notes

@dr-ci
Copy link

dr-ci bot commented Aug 6, 2020

💊 CI failures summary and remediations

As of commit 1130ac4 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

ci.pytorch.org: 2 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 7 times.

@rohan-varma rohan-varma self-requested a review August 6, 2020 17:36
Copy link
Contributor

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, it looks great! Requesting changes for the assertRaisesRegex, and there are also a couple of python lint issues. You can view those by clicking on the failed CI job (we should have inline annotations as well, but those seem to not be working right now).

@thinking-tower thinking-tower changed the title Use BAND, BOR and BXOR for NCCL (all_)reduce throws a runtime error. BAND, BOR and BXOR for NCCL (all_)reduce should throw runtime errors Aug 7, 2020
@ailzhang ailzhang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 7, 2020
Copy link
Contributor

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @thinking-tower!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohan-varma has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@rohan-varma merged this pull request in 6ebc050.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

torch.distributed NCCL backend does not support bitwise reduction ops

6 participants