Skip to content

Conversation

@teng-li
Copy link
Contributor

@teng-li teng-li commented Sep 11, 2018

This makes torch.distributed works for CPU only build.

Also added one more CI test case to cover MPI CPU build.
All CI tests should cover this change

@teng-li teng-li added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Sep 11, 2018
@teng-li teng-li requested a review from pietern September 11, 2018 08:36
Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert in our build system nor c10d, but looks ok after a cursory look. I have some comments that could be improved. It's not very nice that we have to sprinkle that many ifdefs, and I'm not sure if that will work fine with our binaries, so it might be better to consult someone else too.

endif()
else()
message(STATUS "Building C10D without CUDA support")
endif()

This comment was marked as off-topic.

add_definitions(-DUSE_CUDA=1)
endif()
else()
message(STATUS "Building C10D without CUDA support")

This comment was marked as off-topic.

#ifdef USE_CUDA
,
cuda_(false)
#endif

This comment was marked as off-topic.

for (size_t i = 0; i < srcSizes.size(); i++) {
#ifdef USE_CUDA
deviceGuard.set_index(key.type->is_cuda() ? key.devices[i] : -1);
#else

This comment was marked as off-topic.

c10d_add_test(ProcessGroupGlooAsyncTest.cpp c10d c10d_cuda_test)
if(DISTRIBUTED_NCCL_FOUND)
c10d_add_test(ProcessGroupNCCLTest.cpp c10d c10d_cuda_test)
endif()

This comment was marked as off-topic.

if(USE_CUDA AND CUDA_FOUND)
cuda_add_library(c10d_cuda_test CUDATest.cu)
target_link_libraries(c10d_cuda_test c10d)
endif()

This comment was marked as off-topic.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teng-li has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

petrex pushed a commit to petrex/pytorch that referenced this pull request Sep 12, 2018
* master: (165 commits)
  Aibench for asr decoder
  Explicitly set locale on docs build. (pytorch#11595)
  Documentation for debugging JIT
  Fused weightnorm for ATen (pytorch#10842)
  Move Type, Tensor, TensorMethods to core.
  Add reminder % to the jit
  Fix reloading modules back into python (pytorch#11552)
  Add trigonometry functions to docs/source/onnx.rst
  Add EndToEndHybridModel CUDA tests (pytorch#11544)
  minor formatting error log (pytorch#11528)
  Warn that export+import module always load onto the CPU (pytorch#11485)
  caffe2::StorageImpl use at::DataPtr (pytorch#11282)
  Sync all libnccl soversions, not just libnccl.so.1 (pytorch#11575)
  Document BatchNorm and update default behavior (pytorch#11484)
  Typo fix in randomness.rst (pytorch#11571)
  Move some bmm/baddbmm to ATen (pytorch#11292)
  Make c10d test work on CPU only build (pytorch#11567)
  Clean up some C++ cruftiness in the script lexer.
  Allow setting deletion constant
  Make C10d support CPU only build (pytorch#11513)
  ...
@teng-li teng-li deleted the c10d_cpu branch October 4, 2018 23:43
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants