-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add a CONTRIBUTING.md for the distributed package. #44224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a CONTRIBUTING.md for the distributed package. #44224
Conversation
The purpose of this file is to help developers on PT distributed get upto speed on the code structure and layout for PT Distributed. Differential Revision: [D23548377](https://our.internmc.facebook.com/intern/diff/D23548377/) [ghstack-poisoned]
The purpose of this file is to help developers on PT distributed get upto speed on the code structure and layout for PT Distributed. Differential Revision: [D23548377](https://our.internmc.facebook.com/intern/diff/D23548377/) [ghstack-poisoned]
Pull Request resolved: #44224 The purpose of this file is to help developers on PT distributed get upto speed on the code structure and layout for PT Distributed. ghstack-source-id: 111483253 Differential Revision: [D23548377](https://our.internmc.facebook.com/intern/diff/D23548377/)
💊 CI failures summary and remediationsAs of commit b5c9abf (more details on the Dr. CI page):
Extra GitHub checks: 1 failed
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 7 times. |
|
This is great! Shall we include some information on Distributed specific development tips, such as how to run the tests for RPC (maybe the various backends as well), DDP, and collective comm APIs? |
|
|
||
| ### Distributed Data Parallel | ||
|
|
||
| DDP is implemented as a module in [distributed.py](../nn/parallel/distributed.py) with some of the core functions implemented in [reducer.cpp](../csrc/distributed/c10d/reducer.cpp) and [comm.cpp](../csrc/distributed/c10d/reducer.cpp). Gradients synchronizations occur in backward pass, triggered as autograd hooks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we link to DDP design (https://pytorch.org/docs/stable/notes/ddp.html) and maybe do the same for RPC / Dist Autograd? I guess we may not need to since these links are available from the dist_overview page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dist_overview page does cover this and that's why I didn't mention a lot of design details here.
The purpose of this file is to help developers on PT distributed get upto speed on the code structure and layout for PT Distributed. Differential Revision: [D23548377](https://our.internmc.facebook.com/intern/diff/D23548377/) [ghstack-poisoned]
Pull Request resolved: #44224 The purpose of this file is to help developers on PT distributed get upto speed on the code structure and layout for PT Distributed. ghstack-source-id: 111644842 Differential Revision: [D23548377](https://our.internmc.facebook.com/intern/diff/D23548377/)
Codecov Report
@@ Coverage Diff @@
## gh/pritamdamania87/159/base #44224 +/- ##
============================================================
Coverage 69.25% 69.25%
============================================================
Files 381 381
Lines 47580 47580
============================================================
+ Hits 32952 32953 +1
+ Misses 14628 14627 -1
Continue to review full report at Codecov.
|
|
This pull request has been merged in a2a81e1. |
Stack from ghstack:
The purpose of this file is to help developers on PT distributed get
upto speed on the code structure and layout for PT Distributed.
Differential Revision: D23548377