Add gpu guard for broadcast_coalesce #5655

ailzhang · 2018-03-09T01:50:05Z

This patch fixes a bug triggered by #5182 when we have multiple layers in the model, and the DDP is run on a single node, with a subset of GPUs each.
For example, as in the test we run 2 processes on a 8 GPU node, both processes are visible to all GPUs. We create the DDP model by nn.parallel.DistributedDataParallel(model_DDP, device_ids=gpu_subset) where gpu_subset is 0,1,2,3 for process 1, and 4,5,6,7 for process 2.
utils::flatten_dense_tensors(chunk.tensors) will actually create a new Tensor which a flatten version of layer weights. Without this patch, this tensor goes to default GPU 0 despite all layers weights for process 2 are on GPU4, this will further error out when broadcast requires the tensor to be on the GPU 4 for process 2.
The gpu guard inside the for loop has nothing to do with the current bug, I thought it's good to add it as a safety guard.
@apaszke

add gpu guard for broadcast_coalesce

35c23e1

onnxbot-worker-3 mentioned this pull request Mar 9, 2018

[auto] pytorch-pr-5655 onnxbot/onnx-fb-universe#1036

Closed

soumith merged commit a3f4635 into pytorch:master Mar 9, 2018

apaszke mentioned this pull request Mar 10, 2018

Minor improvement in AutoGPU usage in CUDA bindings #5689

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add gpu guard for broadcast_coalesce #5655

Add gpu guard for broadcast_coalesce #5655

Uh oh!

ailzhang commented Mar 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add gpu guard for broadcast_coalesce #5655

Add gpu guard for broadcast_coalesce #5655

Uh oh!

Conversation

ailzhang commented Mar 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants