Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Sep 6, 2018

In #9466 I got rid of storage views and eliminated all places where
they were used... OR SO I THOUGHT. In actuality, under certain
conditions (specifically, if you trained a CUDA multiprocessing model
shared over CUDA IPC and then serialized your parameters), you could
also serialize storage slices to the saved model format. In #9466,
I "fixed" the case when you loaded the legacy model format (really,
just unshared the storages--not strictly kosher but if you aren't
updating the parameters, shouldn't matter), but NOT the modern model format, so
such models would fail.

So, I could have applied the legacy model format fix too, but
hyperfraise remarked that he had applied a fix that was effectively
the same as unsharing the storages, but it had caused his model to
behave differently. So I looked into it again, and realized that
using a custom deleter, I could simulate the same behavior as old
storage slices. So back they come.

In principle, I could also reimplement storage views entirely using
our allocators, but I'm not going to do that unless someone really
really wants it.

Fixes #10120.

Signed-off-by: Edward Z. Yang ezyang@fb.com

In pytorch#9466 I got rid of storage views and eliminated all places where
they were used... OR SO I THOUGHT.  In actuality, under certain
conditions (specifically, if you trained a CUDA multiprocessing model
shared over CUDA IPC and then serialized your parameters), you could
also serialize storage slices to the saved model format.  In pytorch#9466,
I "fixed" the case when you loaded the legacy model format (really,
just unshared the storages--not strictly kosher but if you aren't
updating the parameters, shouldn't matter), but NOT the modern model format, so
such models would fail.

So, I could have applied the legacy model format fix too, but
hyperfraise remarked that he had applied a fix that was effectively
the same as unsharing the storages, but it had caused his model to
behave differently.  So I looked into it again, and realized that
using a custom deleter, I could simulate the same behavior as old
storage slices.  So back they come.

In principle, I could also reimplement storage views entirely using
our allocators, but I'm not going to do that unless someone really
really wants it.

Fixes pytorch#10120.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@ailzhang ailzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix!

PenghuiCheng pushed a commit to PenghuiCheng/pytorch that referenced this pull request Sep 11, 2018
Summary:
In pytorch#9466 I got rid of storage views and eliminated all places where
they were used... OR SO I THOUGHT.  In actuality, under certain
conditions (specifically, if you trained a CUDA multiprocessing model
shared over CUDA IPC and then serialized your parameters), you could
also serialize storage slices to the saved model format.  In pytorch#9466,
I "fixed" the case when you loaded the legacy model format (really,
just unshared the storages--not strictly kosher but if you aren't
updating the parameters, shouldn't matter), but NOT the modern model format, so
such models would fail.

So, I could have applied the legacy model format fix too, but
hyperfraise remarked that he had applied a fix that was effectively
the same as unsharing the storages, but it had caused his model to
behave differently.  So I looked into it again, and realized that
using a custom deleter, I could simulate the same behavior as old
storage slices.  So back they come.

In principle, I could also reimplement storage views entirely using
our allocators, but I'm not going to do that unless someone really
really wants it.

Fixes pytorch#10120.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: pytorch#11314

Reviewed By: ailzhang

Differential Revision: D9671966

Pulled By: ezyang

fbshipit-source-id: fd863783d03b6a6421d6b9ae21ce2f0e44a0dcce
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

load model with pytorch0.4 with model saved with pytorch0.3

3 participants