Skip to content

Conversation

@srinivas212
Copy link

Differential Revision: D22917967

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22917967

@dr-ci
Copy link

dr-ci bot commented Aug 4, 2020

💊 CI failures summary and remediations

As of commit c6a84b7 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 7 times.

@srinivas212 srinivas212 changed the title (dup) Add NCCL Alltoall to PT NCCL process group Add NCCL Alltoall to PT NCCL process group Aug 4, 2020
Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test errors are irrelevant. Thanks for adding this!

Aug 04 06:24:51 ERROR test/onnx/test_pytorch_onnx_onnxruntime.py - NameError: name 'enableScr...
Aug 04 06:24:51 ERROR test/onnx/test_pytorch_onnx_onnxruntime.py - NameError: name 'enableScr...
Aug 04 06:24:51 ERROR test/onnx/test_pytorch_onnx_onnxruntime.py - NameError: name 'enableScr...

Summary:
Pull Request resolved: pytorch#42514

Add Alltoall and Alltoallv to PT NCCL process group using NCCL Send/Recv.

Reviewed By: mrshenli

Differential Revision: D22917967

fbshipit-source-id: e3345ddbb5217dc5fabcd5c02db4cf67becff4b5
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22917967

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in ecb88c5.

mingzhe09088 pushed a commit that referenced this pull request Oct 7, 2020
Use `torch:cuda::nccl:all2all` from `ProcesGroupNCCL.cpp`

Fixes #42517

Here is a NCCL dependency graph:
```
libnccl.a --> libtorch_cuda.so ---> libtorch_python.so
    |                                   ^
    |                                   |
    --------> libc10d.a -----------------
```
When static library is linked into a dynamic library or an executable, linker is removes all unused/duplicate symbols from that library, unless `-whole-archive` option is used. Before #42514 all nccl call made from `ProcessGroupNCCL.cpp` were also made from `torch/csrc/cuda/nccl.cpp`, which is compiled as part of `libtorch_cuda.so`
But adding `ncclSend`|`ncclRecv` to ProcesGroupNCCL.cpp forced linker to embed those into `libtorch_python.so`, which also resulted in linking other dependent symbols into the library.

This PR adds `nccl[Send|Recv]` call to `torch_cuda.so` by implementing `all2all` in `torch_cuda` and thus avoids double linking the static library.

More involved, but prone solution, would be to use wrappers exported in `torch::cuda::nccl` namespace, instead of making direct NCCL API calls. 

Differential Revision: [D24138011](https://our.internmc.facebook.com/intern/diff/D24138011)

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this pull request Oct 8, 2020
Summary:
Pull Request resolved: #45900

Use `torch:cuda::nccl:all2all` from `ProcesGroupNCCL.cpp`

Fixes #42517

Here is a NCCL dependency graph:
```
libnccl.a --> libtorch_cuda.so ---> libtorch_python.so
    |                                   ^
    |                                   |
    --------> libc10d.a -----------------
```
When static library is linked into a dynamic library or an executable, linker is removes all unused/duplicate symbols from that library, unless `-whole-archive` option is used. Before #42514 all nccl call made from `ProcessGroupNCCL.cpp` were also made from `torch/csrc/cuda/nccl.cpp`, which is compiled as part of `libtorch_cuda.so`
But adding `ncclSend`|`ncclRecv` to ProcesGroupNCCL.cpp forced linker to embed those into `libtorch_python.so`, which also resulted in linking other dependent symbols into the library.

This PR adds `nccl[Send|Recv]` call to `torch_cuda.so` by implementing `all2all` in `torch_cuda` and thus avoids double linking the static library.

More involved, but prone solution, would be to use wrappers exported in `torch::cuda::nccl` namespace, instead of making direct NCCL API calls.

Test Plan: Imported from OSS

Reviewed By: mingzhe09088

Differential Revision: D24138011

Pulled By: malfet

fbshipit-source-id: 33305197fc7d8707b7fd3a66b543f7733b9241a1
mingzhe09088 pushed a commit to mingzhe09088/pytorch-1 that referenced this pull request Dec 4, 2020
Summary:
Pull Request resolved: pytorch#45900

Use `torch:cuda::nccl:all2all` from `ProcesGroupNCCL.cpp`

Fixes pytorch#42517

Here is a NCCL dependency graph:
```
libnccl.a --> libtorch_cuda.so ---> libtorch_python.so
    |                                   ^
    |                                   |
    --------> libc10d.a -----------------
```
When static library is linked into a dynamic library or an executable, linker is removes all unused/duplicate symbols from that library, unless `-whole-archive` option is used. Before pytorch#42514 all nccl call made from `ProcessGroupNCCL.cpp` were also made from `torch/csrc/cuda/nccl.cpp`, which is compiled as part of `libtorch_cuda.so`
But adding `ncclSend`|`ncclRecv` to ProcesGroupNCCL.cpp forced linker to embed those into `libtorch_python.so`, which also resulted in linking other dependent symbols into the library.

This PR adds `nccl[Send|Recv]` call to `torch_cuda.so` by implementing `all2all` in `torch_cuda` and thus avoids double linking the static library.

More involved, but prone solution, would be to use wrappers exported in `torch::cuda::nccl` namespace, instead of making direct NCCL API calls.

Test Plan: Imported from OSS

Reviewed By: mingzhe09088

Differential Revision: D24138011

Pulled By: malfet

fbshipit-source-id: 33305197fc7d8707b7fd3a66b543f7733b9241a1
facebook-github-bot pushed a commit that referenced this pull request Jan 20, 2021
Summary:
In #42514, NCCL `alltoall_single` is already added. This PR adds NCCL `alltoall`.

The difference between `alltoall_single` and `alltoall` is: `alltoall_single`  works on a single tensor and send/receive slices of that tensor, while `alltoall` works on a list of tensor, and send/receive tensors in that list.

cc: ptrblck ngimel

Pull Request resolved: #44374

Reviewed By: zhangguanheng66, mrshenli

Differential Revision: D24455427

Pulled By: srinivas212

fbshipit-source-id: 42fdebdd14f8340098e2c34ef645bd40603552b1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants