Skip to content

Conversation

@pritamdamania87
Copy link
Contributor

@pritamdamania87 pritamdamania87 commented Sep 4, 2020

Stack from ghstack:

The purpose of this file is to help developers on PT distributed get
upto speed on the code structure and layout for PT Distributed.

Differential Revision: D23548377

The purpose of this file is to help developers on PT distributed get
upto speed on the code structure and layout for PT Distributed.

Differential Revision: [D23548377](https://our.internmc.facebook.com/intern/diff/D23548377/)

[ghstack-poisoned]
The purpose of this file is to help developers on PT distributed get
upto speed on the code structure and layout for PT Distributed.

Differential Revision: [D23548377](https://our.internmc.facebook.com/intern/diff/D23548377/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Sep 4, 2020
Pull Request resolved: #44224

The purpose of this file is to help developers on PT distributed get
upto speed on the code structure and layout for PT Distributed.
ghstack-source-id: 111483253

Differential Revision: [D23548377](https://our.internmc.facebook.com/intern/diff/D23548377/)
@dr-ci
Copy link

dr-ci bot commented Sep 4, 2020

💊 CI failures summary and remediations

As of commit b5c9abf (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

Extra GitHub checks: 1 failed


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 7 times.

@rohan-varma
Copy link
Contributor

This is great! Shall we include some information on Distributed specific development tips, such as how to run the tests for RPC (maybe the various backends as well), DDP, and collective comm APIs?


### Distributed Data Parallel

DDP is implemented as a module in [distributed.py](../nn/parallel/distributed.py) with some of the core functions implemented in [reducer.cpp](../csrc/distributed/c10d/reducer.cpp) and [comm.cpp](../csrc/distributed/c10d/reducer.cpp). Gradients synchronizations occur in backward pass, triggered as autograd hooks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we link to DDP design (https://pytorch.org/docs/stable/notes/ddp.html) and maybe do the same for RPC / Dist Autograd? I guess we may not need to since these links are available from the dist_overview page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dist_overview page does cover this and that's why I didn't mention a lot of design details here.

The purpose of this file is to help developers on PT distributed get
upto speed on the code structure and layout for PT Distributed.

Differential Revision: [D23548377](https://our.internmc.facebook.com/intern/diff/D23548377/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Sep 9, 2020
Pull Request resolved: #44224

The purpose of this file is to help developers on PT distributed get
upto speed on the code structure and layout for PT Distributed.
ghstack-source-id: 111644842

Differential Revision: [D23548377](https://our.internmc.facebook.com/intern/diff/D23548377/)
@codecov
Copy link

codecov bot commented Sep 9, 2020

Codecov Report

Merging #44224 into gh/pritamdamania87/159/base will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@                     Coverage Diff                      @@
##           gh/pritamdamania87/159/base   #44224   +/-   ##
============================================================
  Coverage                        69.25%   69.25%           
============================================================
  Files                              381      381           
  Lines                            47580    47580           
============================================================
+ Hits                             32952    32953    +1     
+ Misses                           14628    14627    -1     
Impacted Files Coverage Δ
torch/testing/_internal/expecttest.py 78.57% <0.00%> (+1.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 43e38d6...b5c9abf. Read the comment docs.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in a2a81e1.

@facebook-github-bot facebook-github-bot deleted the gh/pritamdamania87/159/head branch September 14, 2020 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants