-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[c10d] Make C10d support CPU only build #11513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
apaszke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert in our build system nor c10d, but looks ok after a cursory look. I have some comments that could be improved. It's not very nice that we have to sprinkle that many ifdefs, and I'm not sure if that will work fine with our binaries, so it might be better to consult someone else too.
| endif() | ||
| else() | ||
| message(STATUS "Building C10D without CUDA support") | ||
| endif() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| add_definitions(-DUSE_CUDA=1) | ||
| endif() | ||
| else() | ||
| message(STATUS "Building C10D without CUDA support") |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| #ifdef USE_CUDA | ||
| , | ||
| cuda_(false) | ||
| #endif |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| for (size_t i = 0; i < srcSizes.size(); i++) { | ||
| #ifdef USE_CUDA | ||
| deviceGuard.set_index(key.type->is_cuda() ? key.devices[i] : -1); | ||
| #else |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/lib/c10d/test/CMakeLists.txt
Outdated
| c10d_add_test(ProcessGroupGlooAsyncTest.cpp c10d c10d_cuda_test) | ||
| if(DISTRIBUTED_NCCL_FOUND) | ||
| c10d_add_test(ProcessGroupNCCLTest.cpp c10d c10d_cuda_test) | ||
| endif() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| if(USE_CUDA AND CUDA_FOUND) | ||
| cuda_add_library(c10d_cuda_test CUDATest.cu) | ||
| target_link_libraries(c10d_cuda_test c10d) | ||
| endif() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
teng-li has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
* master: (165 commits) Aibench for asr decoder Explicitly set locale on docs build. (pytorch#11595) Documentation for debugging JIT Fused weightnorm for ATen (pytorch#10842) Move Type, Tensor, TensorMethods to core. Add reminder % to the jit Fix reloading modules back into python (pytorch#11552) Add trigonometry functions to docs/source/onnx.rst Add EndToEndHybridModel CUDA tests (pytorch#11544) minor formatting error log (pytorch#11528) Warn that export+import module always load onto the CPU (pytorch#11485) caffe2::StorageImpl use at::DataPtr (pytorch#11282) Sync all libnccl soversions, not just libnccl.so.1 (pytorch#11575) Document BatchNorm and update default behavior (pytorch#11484) Typo fix in randomness.rst (pytorch#11571) Move some bmm/baddbmm to ATen (pytorch#11292) Make c10d test work on CPU only build (pytorch#11567) Clean up some C++ cruftiness in the script lexer. Allow setting deletion constant Make C10d support CPU only build (pytorch#11513) ...
This makes torch.distributed works for CPU only build.
Also added one more CI test case to cover MPI CPU build.
All CI tests should cover this change