-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Support AMP in nn.parallel #43102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support AMP in nn.parallel #43102
Conversation
Merge Upstream
💊 CI failures summary and remediationsAs of commit 2072a64 (more details on the Dr. CI page):
🕵️ 4 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @anoidgit, thanks for adding this. Shall we add a test to cover the new code?
|
@mrshenli Hi, with pleasure. It would be better if there was a corresponding test, but I do not know how to make it, I cannot also find the test code for |
|
Hey @anoidgit, you can add this to pytorch/test/distributed/test_data_parallel.py Lines 80 to 96 in ccc831a
|
|
Hi @mrshenli , thanks a lot for your help. I made the test following the example, please help ensure its correctness. |
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a minor comment. Thanks for contributing!
| expected_outputs = (expected1, expected2) | ||
|
|
||
| # each input can be either a collection of positional arguments | ||
| # or an object representing the single argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason for the long indent before "or"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for pointing this out. It also looks strange to me, but I do not know the reason, I am following the example. If it is not right, please help fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that might be some debug vestige. No worries, I will land this as is, and submit a PR to fix both.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Code changes look good, thanks! DP + autocast documentation should also be updated, but I can do that myself in a separate PR. Updating documentation is not urgent because guidance in the existing documentation will continue to work after this PR, it will just become overkill. |
|
@mcarilli many thanks for your efforts :) |
Take care of the state of autocast in
parallel_apply, so there is no need to decorate model implementations.