Skip to content

Conversation

@rohan-varma
Copy link
Contributor

@rohan-varma rohan-varma commented Jul 23, 2020

Resubmit of #41318 pushed to ci-all branch.

Original description:
Closes #24137.
This PR adds support for the torch.bool tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since bool is not supported as a native ncclDataType_t, we add the following logic:

Map at::kBool to ncclUint8
During reduction (allreduce for example), if the operation is SUM, we instead override to to a MAX, to avoid overflow issues. The rest of the operations work with no changes. In the boolean case, changing sum to max makes no correctness difference since they both function as a bitwise OR.
The reduction logic (for example for reduce/allreduce) is as follows:
sum, max = bitwise or
product, min = bitwise and

Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362

Closes #24137. Since bool is
not supported as a native ncclDataType_t, we add some upcasting + downcasting
logic to support it.

Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

[ghstack-poisoned]
Closes #24137. Since bool is
not supported as a native ncclDataType_t, we add some upcasting + downcasting
logic to support it.

Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

[ghstack-poisoned]
Closes #24137. 
This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic:
1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 
2) Run the specified reduction.
3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. 

The reduction logic (for example for reduce/allreduce) is as follows:
sum, max = bitwise or
product, min = bitwise and

Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362

Tests are added to ensure that the reductions work as expected. 
Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

[ghstack-poisoned]
Closes #24137. 
This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic:
1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 
2) Run the specified reduction.
3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. 

The reduction logic (for example for reduce/allreduce) is as follows:
sum, max = bitwise or
product, min = bitwise and

Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362

Tests are added to ensure that the reductions work as expected. 
Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

[ghstack-poisoned]
Closes #24137. 
This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic:
1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 
2) Run the specified reduction.
3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. 

The reduction logic (for example for reduce/allreduce) is as follows:
sum, max = bitwise or
product, min = bitwise and

Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362

Tests are added to ensure that the reductions work as expected. 
Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

[ghstack-poisoned]
Closes #24137. 
This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic:
1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 
2) Run the specified reduction.
3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. 

The reduction logic (for example for reduce/allreduce) is as follows:
sum, max = bitwise or
product, min = bitwise and

Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362

Tests are added to ensure that the reductions work as expected. 
Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

[ghstack-poisoned]
Closes #24137. 
This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic:
1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 
2) Run the specified reduction.
3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. 

The reduction logic (for example for reduce/allreduce) is as follows:
sum, max = bitwise or
product, min = bitwise and

Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362

Tests are added to ensure that the reductions work as expected. 
Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

[ghstack-poisoned]
Closes #24137. 
This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic:
1) Map `at::kBool` to `ncclUint8`
2) During reduction (allreduce for example), if the operation is SUM, we instead override to to a MAX, to avoid overflow issues. The rest of the operations work with no changes. In the boolean case, changing sum to max makes no correctness difference since they both function as a bitwise OR. 

The reduction logic (for example for reduce/allreduce) is as follows:
sum, max = bitwise or
product, min = bitwise and

Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362

Tests are added to ensure that the reductions work as expected. 
Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

[ghstack-poisoned]
Closes #24137. 
This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic:
1) Map `at::kBool` to `ncclUint8`
2) During reduction (allreduce for example), if the operation is SUM, we instead override to to a MAX, to avoid overflow issues. The rest of the operations work with no changes. In the boolean case, changing sum to max makes no correctness difference since they both function as a bitwise OR. 

The reduction logic (for example for reduce/allreduce) is as follows:
sum, max = bitwise or
product, min = bitwise and

Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362

Tests are added to ensure that the reductions work as expected. 
Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)!

[ghstack-poisoned]
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohan-varma has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@rohan-varma
Copy link
Contributor Author

@mrshenli, The tests for this diff should be fixed now and ci-all is green. The bug was that for nccl_bool_reduce test, we should only do the assert on the rank which gets the final reduce tensor, not the other ranks.

@rohan-varma rohan-varma requested a review from mrshenli July 24, 2020 18:31
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohan-varma has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohan-varma has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@dr-ci
Copy link

dr-ci bot commented Jul 24, 2020

💊 CI failures summary and remediations

As of commit 73040f7 (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_windows_vs2017_14.13_py36_cuda10.1_build (1/1)

Step: "Checkout code" (full log | diagnosis details | 🔁 rerun) ❄️

Writing SSH key for checkout to id_rsa
Creating .ssh directory
Adding the following entries to known_hosts:
github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
bitbucket.org ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAubiN81eDcafrgMeLzaFPsw2kNvEcqTKl/VqLat/MaB33pZy0y3rJZtnqwR2qOOvbwKZYKiEO1O6VqNEBxKvJJelCq0dTXWT5pbO2gDXC6h6QDXCaHo6pOHGPUy+YBaGQRGuSusMEASYiWunYN0vCAI8QaXnWMXNMdFP3jHAJH0eDsoiGnLPBlBp4TNm6rYI74nMzgz3B9IikW4WVK+dc8KZJZWYjAuORU3jc1c/NPskD2ASinf8v3xnfXeukU0sJ5N6m5E8VLjObPEO+mN2t/FZTMZLiFqPWc/ALSqnMnnhwrNi2rbfg/rd/IpL8Le3pSBne8+seeFVBoGqzHM9yXw==

Writing SSH key for checkout to id_rsa

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 1 time.

@facebook-github-bot
Copy link
Contributor

@rohan-varma merged this pull request in 366c014.

@facebook-github-bot facebook-github-bot deleted the ci-all/rohan/nccl_bool branch January 27, 2021 18:26
kingchc added a commit to kingchc/torch_ucc that referenced this pull request Apr 6, 2022
)

Summary:
Pull Request resolved: facebookresearch#64

support torch.bool and convert it to UCC UINT8
 - enable collectives for tensor with torch.bool
 - add `to_ucc_reduceOp` to handle special case for reduction operations (SUM->MAX, PROD->MIN, and other operations should not be supported)
 - add `to_ucc_dType` to check if the size of `torch.bool` is 1 since it's implementation-defined. Abort the job if size of `torch.bool` is not 1.

Reference:
- support in NCCL PG: pytorch/pytorch#41959

Differential Revision: D35301907

fbshipit-source-id: 53e7c771633e68abfb81981d3e0aa5b723e96bd2
facebook-github-bot pushed a commit to facebookresearch/torch_ucc that referenced this pull request Apr 21, 2022
Summary:
Pull Request resolved: #64

support torch.bool and convert it to UCC UINT8
 - enable collectives for tensor with torch.bool
 - add `to_ucc_reduceOp` to handle special case for reduction operations (SUM->MAX, PROD->MIN, and other operations should not be supported)
 - add `to_ucc_dType` to check if the size of `torch.bool` is 1 since it's implementation-defined. Abort the job if size of `torch.bool` is not 1.

Reference:
- support in NCCL PG: pytorch/pytorch#41959

Reviewed By: bryanmr

Differential Revision: D35301907

fbshipit-source-id: eba1264c91ff97a708289e66d763919505e9250f
@ghost ghost mentioned this pull request Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[distributed] NCCL Backend doesn't support torch.bool data type

5 participants