-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Don't serialize hooks #11705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't serialize hooks #11705
Conversation
|
CC @PetrochukM |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
torch/multiprocessing/reductions.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
I'm going to add a test for saving a nccl-parallelized model, and then we'll call it a day. |
6b2ec7c to
d18b2d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment + accidental submodule update
test/test_distributed.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
2907fd1 to
4379934
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
This PR is still missing warning reporting for when we would have serialized a hook but don't, but all the other pieces are here. |
9cb0292 to
455019a
Compare
|
Warnings added |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@pytorchbot retest this please |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
test/test_distributed.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_thd_distributed.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/utils/hooks.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/utils/hooks.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Fixes pytorch#11683. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
791a4ec to
f8cac60
Compare
apaszke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tests leak resources
| self.bucket_events[bucket_idx][device_idx] = event | ||
| self._queue_reduction(bucket_idx) | ||
|
|
||
| distributed_data_parallel_hook._torch_unserializable = True |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_multiprocessing.py
Outdated
| pass | ||
|
|
||
| # Shut up warnings | ||
| hook._torch_unserializable = True |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_distributed.py
Outdated
| # TODO: It should be possible to save the entire model, | ||
| # but this doesn't work at the moment. Update this test | ||
| # when it does work. | ||
| tmp_file = tempfile.TemporaryFile() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_thd_distributed.py
Outdated
| # Test that saving and loading work | ||
| # gloo serialization doesn't work, see #12261 | ||
| if BACKEND != "gloo": | ||
| tmp_file = tempfile.TemporaryFile() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@apaszke Ready to go? |
Fixes #11683.
Signed-off-by: Edward Z. Yang ezyang@fb.com