Add NCCL Alltoall to PT NCCL process group #42514

srinivas212 · 2020-08-04T05:30:42Z

Differential Revision: D22917967

facebook-github-bot · 2020-08-04T05:31:13Z

This pull request was exported from Phabricator. Differential Revision: D22917967

dr-ci · 2020-08-04T06:22:31Z

💊 CI failures summary and remediations

As of commit c6a84b7 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-xenial-rocm3.5.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 7 times.

mrshenli

Test errors are irrelevant. Thanks for adding this!

Aug 04 06:24:51 ERROR test/onnx/test_pytorch_onnx_onnxruntime.py - NameError: name 'enableScr...
Aug 04 06:24:51 ERROR test/onnx/test_pytorch_onnx_onnxruntime.py - NameError: name 'enableScr...
Aug 04 06:24:51 ERROR test/onnx/test_pytorch_onnx_onnxruntime.py - NameError: name 'enableScr...

Summary: Pull Request resolved: pytorch#42514 Add Alltoall and Alltoallv to PT NCCL process group using NCCL Send/Recv. Reviewed By: mrshenli Differential Revision: D22917967 fbshipit-source-id: e3345ddbb5217dc5fabcd5c02db4cf67becff4b5

facebook-github-bot · 2020-08-04T15:19:01Z

This pull request was exported from Phabricator. Differential Revision: D22917967

facebook-github-bot · 2020-08-04T16:25:30Z

This pull request has been merged in ecb88c5.

Use `torch:cuda::nccl:all2all` from `ProcesGroupNCCL.cpp` Fixes #42517 Here is a NCCL dependency graph: ``` libnccl.a --> libtorch_cuda.so ---> libtorch_python.so | ^ | | --------> libc10d.a ----------------- ``` When static library is linked into a dynamic library or an executable, linker is removes all unused/duplicate symbols from that library, unless `-whole-archive` option is used. Before #42514 all nccl call made from `ProcessGroupNCCL.cpp` were also made from `torch/csrc/cuda/nccl.cpp`, which is compiled as part of `libtorch_cuda.so` But adding `ncclSend`|`ncclRecv` to ProcesGroupNCCL.cpp forced linker to embed those into `libtorch_python.so`, which also resulted in linking other dependent symbols into the library. This PR adds `nccl[Send|Recv]` call to `torch_cuda.so` by implementing `all2all` in `torch_cuda` and thus avoids double linking the static library. More involved, but prone solution, would be to use wrappers exported in `torch::cuda::nccl` namespace, instead of making direct NCCL API calls. Differential Revision: [D24138011](https://our.internmc.facebook.com/intern/diff/D24138011) [ghstack-poisoned]

Summary: Pull Request resolved: #45900 Use `torch:cuda::nccl:all2all` from `ProcesGroupNCCL.cpp` Fixes #42517 Here is a NCCL dependency graph: ``` libnccl.a --> libtorch_cuda.so ---> libtorch_python.so | ^ | | --------> libc10d.a ----------------- ``` When static library is linked into a dynamic library or an executable, linker is removes all unused/duplicate symbols from that library, unless `-whole-archive` option is used. Before #42514 all nccl call made from `ProcessGroupNCCL.cpp` were also made from `torch/csrc/cuda/nccl.cpp`, which is compiled as part of `libtorch_cuda.so` But adding `ncclSend`|`ncclRecv` to ProcesGroupNCCL.cpp forced linker to embed those into `libtorch_python.so`, which also resulted in linking other dependent symbols into the library. This PR adds `nccl[Send|Recv]` call to `torch_cuda.so` by implementing `all2all` in `torch_cuda` and thus avoids double linking the static library. More involved, but prone solution, would be to use wrappers exported in `torch::cuda::nccl` namespace, instead of making direct NCCL API calls. Test Plan: Imported from OSS Reviewed By: mingzhe09088 Differential Revision: D24138011 Pulled By: malfet fbshipit-source-id: 33305197fc7d8707b7fd3a66b543f7733b9241a1

Summary: Pull Request resolved: pytorch#45900 Use `torch:cuda::nccl:all2all` from `ProcesGroupNCCL.cpp` Fixes pytorch#42517 Here is a NCCL dependency graph: ``` libnccl.a --> libtorch_cuda.so ---> libtorch_python.so | ^ | | --------> libc10d.a ----------------- ``` When static library is linked into a dynamic library or an executable, linker is removes all unused/duplicate symbols from that library, unless `-whole-archive` option is used. Before pytorch#42514 all nccl call made from `ProcessGroupNCCL.cpp` were also made from `torch/csrc/cuda/nccl.cpp`, which is compiled as part of `libtorch_cuda.so` But adding `ncclSend`|`ncclRecv` to ProcesGroupNCCL.cpp forced linker to embed those into `libtorch_python.so`, which also resulted in linking other dependent symbols into the library. This PR adds `nccl[Send|Recv]` call to `torch_cuda.so` by implementing `all2all` in `torch_cuda` and thus avoids double linking the static library. More involved, but prone solution, would be to use wrappers exported in `torch::cuda::nccl` namespace, instead of making direct NCCL API calls. Test Plan: Imported from OSS Reviewed By: mingzhe09088 Differential Revision: D24138011 Pulled By: malfet fbshipit-source-id: 33305197fc7d8707b7fd3a66b543f7733b9241a1

Summary: In #42514, NCCL `alltoall_single` is already added. This PR adds NCCL `alltoall`. The difference between `alltoall_single` and `alltoall` is: `alltoall_single` works on a single tensor and send/receive slices of that tensor, while `alltoall` works on a list of tensor, and send/receive tensors in that list. cc: ptrblck ngimel Pull Request resolved: #44374 Reviewed By: zhangguanheng66, mrshenli Differential Revision: D24455427 Pulled By: srinivas212 fbshipit-source-id: 42fdebdd14f8340098e2c34ef645bd40603552b1

srinivas212 requested review from mrshenli, pietern, pritamdamania87 and zhaojuanmao as code owners August 4, 2020 05:30

facebook-github-bot added the fb-exported label Aug 4, 2020

srinivas212 mentioned this pull request Aug 4, 2020

NCCL Alltoall Process Group introducing time-out of other NCCL tests #42517

Closed

srinivas212 changed the title ~~(dup) Add NCCL Alltoall to PT NCCL process group~~ Add NCCL Alltoall to PT NCCL process group Aug 4, 2020

mrshenli approved these changes Aug 4, 2020

View reviewed changes

srinivas212 force-pushed the export-D22917967 branch from af2fa21 to c6a84b7 Compare August 4, 2020 15:19

facebook-github-bot closed this in ecb88c5 Aug 4, 2020

facebook-github-bot added the merged label Aug 4, 2020

mrshenli mentioned this pull request Aug 11, 2020

[RFC] DeepSpeed + PT Distributed Integration #42849

Open

ngimel mentioned this pull request Sep 9, 2020

Add support for NCCL alltoall #44374

Closed

malfet mentioned this pull request Oct 7, 2020

Add torch::cuda::ncll::all2all #45900

Closed

mruberry added the Merged label Oct 28, 2020

mingzhe09088 mentioned this pull request Dec 4, 2020

[Release/1.7] Enable NCCL A2A on OSS #48857

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add NCCL Alltoall to PT NCCL process group #42514

Add NCCL Alltoall to PT NCCL process group #42514

Uh oh!

srinivas212 commented Aug 4, 2020

Uh oh!

facebook-github-bot commented Aug 4, 2020

Uh oh!

dr-ci bot commented Aug 4, 2020 •

edited

Loading

Uh oh!

mrshenli left a comment

Uh oh!

facebook-github-bot commented Aug 4, 2020

Uh oh!

facebook-github-bot commented Aug 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add NCCL Alltoall to PT NCCL process group #42514

Add NCCL Alltoall to PT NCCL process group #42514

Uh oh!

Conversation

srinivas212 commented Aug 4, 2020

Uh oh!

facebook-github-bot commented Aug 4, 2020

Uh oh!

dr-ci bot commented Aug 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

Uh oh!

mrshenli left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 4, 2020

Uh oh!

facebook-github-bot commented Aug 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dr-ci bot commented Aug 4, 2020 •

edited

Loading