Slightly improve DistributedDataParallel (single-GPU binding) multi-process distributed training performance #4870

teng-li · 2018-01-26T09:32:29Z

@csarofeen 's simple DistributedDataParallel (DDP) version (with sync buffers ON, for fair comparison since we always sync buffer in our DDP) can hit 0.159 sec per iteration on 8 GPU training using 8 DDP processes for Resnet50 32 mini-batch size / GPU, on single node with 8 P100s. And our current DDP can hit 0.164 sec per iteration as the current state with 8 processes single node training.

This PR improves the performance from 0.164 sec/iteration to 0.162 sec/iteration on single-node dist training on 8 P100s

Slightly improve DDP single GPU multi-process dist training performance

dd82690

onnxbot-worker-1 mentioned this pull request Jan 26, 2018

[auto] pytorch-pr-4870 onnxbot/onnx-fb-universe#423

Closed

teng-li changed the title ~~Slightly improve DDP single GPU multi-process dist training performance~~ Slightly improve DistributedDataParallel single GPU multi-process dist training performance Jan 26, 2018

teng-li changed the title ~~Slightly improve DistributedDataParallel single GPU multi-process dist training performance~~ Slightly improve DistributedDataParallel (single-GPU binding) multi-process distributed training performance Jan 26, 2018

teng-li mentioned this pull request Jan 27, 2018

Added mixed-precision support in distributed training #4891

Merged

apaszke approved these changes Jan 27, 2018

View reviewed changes

apaszke merged commit ae28411 into pytorch:master Jan 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slightly improve DistributedDataParallel (single-GPU binding) multi-process distributed training performance #4870

Slightly improve DistributedDataParallel (single-GPU binding) multi-process distributed training performance #4870

Uh oh!

teng-li commented Jan 26, 2018 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Slightly improve DistributedDataParallel (single-GPU binding) multi-process distributed training performance #4870

Slightly improve DistributedDataParallel (single-GPU binding) multi-process distributed training performance #4870

Uh oh!

Conversation

teng-li commented Jan 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

teng-li commented Jan 26, 2018 •

edited

Loading