-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Resubmit #41318] NCCL backend support for torch bool #41959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Closes #24137. Since bool is not supported as a native ncclDataType_t, we add some upcasting + downcasting logic to support it. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]
Closes #24137. Since bool is not supported as a native ncclDataType_t, we add some upcasting + downcasting logic to support it. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]
Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 2) Run the specified reduction. 3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362 Tests are added to ensure that the reductions work as expected. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]
Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 2) Run the specified reduction. 3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362 Tests are added to ensure that the reductions work as expected. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]
Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 2) Run the specified reduction. 3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362 Tests are added to ensure that the reductions work as expected. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]
Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 2) Run the specified reduction. 3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362 Tests are added to ensure that the reductions work as expected. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]
Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Detect if input tensors are of bool type. If so, cast inputs & outputs to int tensors. 2) Run the specified reduction. 3) If we had to cast, cast the outputs back to boolean tensors. If this collective does not operator in-place, then re-cast inputs back to bool so that they are not modified as a result of the op. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362 Tests are added to ensure that the reductions work as expected. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]
Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Map `at::kBool` to `ncclUint8` 2) During reduction (allreduce for example), if the operation is SUM, we instead override to to a MAX, to avoid overflow issues. The rest of the operations work with no changes. In the boolean case, changing sum to max makes no correctness difference since they both function as a bitwise OR. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362 Tests are added to ensure that the reductions work as expected. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]
Closes #24137. This PR adds support for the `torch.bool` tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since `bool` is not supported as a native `ncclDataType_t`, we add the following logic: 1) Map `at::kBool` to `ncclUint8` 2) During reduction (allreduce for example), if the operation is SUM, we instead override to to a MAX, to avoid overflow issues. The rest of the operations work with no changes. In the boolean case, changing sum to max makes no correctness difference since they both function as a bitwise OR. The reduction logic (for example for reduce/allreduce) is as follows: sum, max = bitwise or product, min = bitwise and Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362 Tests are added to ensure that the reductions work as expected. Differential Revision: [D22496604](https://our.internmc.facebook.com/intern/diff/D22496604/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22496604/)! [ghstack-poisoned]
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohan-varma has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@mrshenli, The tests for this diff should be fixed now and |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohan-varma has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohan-varma has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
💊 CI failures summary and remediationsAs of commit 73040f7 (more details on the Dr. CI page): ✅ None of the CI failures appear to be your fault 💚
❄️ 1 failure tentatively classified as flakybut reruns have not yet been triggered to confirm:
|
|
@rohan-varma merged this pull request in 366c014. |
) Summary: Pull Request resolved: facebookresearch#64 support torch.bool and convert it to UCC UINT8 - enable collectives for tensor with torch.bool - add `to_ucc_reduceOp` to handle special case for reduction operations (SUM->MAX, PROD->MIN, and other operations should not be supported) - add `to_ucc_dType` to check if the size of `torch.bool` is 1 since it's implementation-defined. Abort the job if size of `torch.bool` is not 1. Reference: - support in NCCL PG: pytorch/pytorch#41959 Differential Revision: D35301907 fbshipit-source-id: 53e7c771633e68abfb81981d3e0aa5b723e96bd2
Summary: Pull Request resolved: #64 support torch.bool and convert it to UCC UINT8 - enable collectives for tensor with torch.bool - add `to_ucc_reduceOp` to handle special case for reduction operations (SUM->MAX, PROD->MIN, and other operations should not be supported) - add `to_ucc_dType` to check if the size of `torch.bool` is 1 since it's implementation-defined. Abort the job if size of `torch.bool` is not 1. Reference: - support in NCCL PG: pytorch/pytorch#41959 Reviewed By: bryanmr Differential Revision: D35301907 fbshipit-source-id: eba1264c91ff97a708289e66d763919505e9250f
Resubmit of #41318 pushed to ci-all branch.
Original description:
Closes #24137.
This PR adds support for the torch.bool tensor type to ProcessGroupNCCL. For most types we use the existing mapping, but since bool is not supported as a native ncclDataType_t, we add the following logic:
Map at::kBool to ncclUint8
During reduction (allreduce for example), if the operation is SUM, we instead override to to a MAX, to avoid overflow issues. The rest of the operations work with no changes. In the boolean case, changing sum to max makes no correctness difference since they both function as a bitwise OR.
The reduction logic (for example for reduce/allreduce) is as follows:
sum, max = bitwise or
product, min = bitwise and
Note that this PR doesn't add support for BAND/BOR/BXOR. That is because these reduction ops currently are not supported by NCCL backend, see #41362