Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Sep 14, 2018

Fixes #11683.

Signed-off-by: Edward Z. Yang ezyang@fb.com

@ezyang
Copy link
Contributor Author

ezyang commented Sep 14, 2018

CC @PetrochukM

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@ssnl
Copy link
Collaborator

ssnl commented Sep 25, 2018

@ezyang @apaszke How should we proceed with this?

@ezyang
Copy link
Contributor Author

ezyang commented Sep 27, 2018

I'm going to add a test for saving a nccl-parallelized model, and then we'll call it a day.

@ezyang ezyang force-pushed the pr/dont-serialize-hooks branch from 6b2ec7c to d18b2d7 Compare September 28, 2018 13:58
Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment + accidental submodule update

This comment was marked as off-topic.

This comment was marked as off-topic.

@ezyang ezyang changed the title Don't serialize hooks when transmitting tensors with multiprocessing. Don't serialize hooks Oct 1, 2018
@ezyang ezyang force-pushed the pr/dont-serialize-hooks branch 2 times, most recently from 2907fd1 to 4379934 Compare October 1, 2018 17:04
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor Author

ezyang commented Oct 1, 2018

This PR is still missing warning reporting for when we would have serialized a hook but don't, but all the other pieces are here.

@ezyang ezyang force-pushed the pr/dont-serialize-hooks branch 2 times, most recently from 9cb0292 to 455019a Compare October 5, 2018 04:04
@ezyang
Copy link
Contributor Author

ezyang commented Oct 5, 2018

Warnings added

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor Author

ezyang commented Oct 5, 2018

@pytorchbot retest this please

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

ezyang added 5 commits October 8, 2018 15:36
Fixes pytorch#11683.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
ezyang added 2 commits October 8, 2018 15:36
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
@ezyang ezyang force-pushed the pr/dont-serialize-hooks branch from 791a4ec to f8cac60 Compare October 8, 2018 23:26
Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests leak resources

self.bucket_events[bucket_idx][device_idx] = event
self._queue_reduction(bucket_idx)

distributed_data_parallel_hook._torch_unserializable = True

This comment was marked as off-topic.

pass

# Shut up warnings
hook._torch_unserializable = True

This comment was marked as off-topic.

# TODO: It should be possible to save the entire model,
# but this doesn't work at the moment. Update this test
# when it does work.
tmp_file = tempfile.TemporaryFile()

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

# Test that saving and loading work
# gloo serialization doesn't work, see #12261
if BACKEND != "gloo":
tmp_file = tempfile.TemporaryFile()

This comment was marked as off-topic.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor Author

ezyang commented Oct 16, 2018

@apaszke Ready to go?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't pickle local object 'DistributedDataParallel._register_nccl_grad_hook.

4 participants