-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Fix more spectral norm bugs #13350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix more spectral norm bugs #13350
Conversation
|
@YaoshengFu This PR will fix the spectral norm bug you see. |
torch/nn/utils/spectral_norm.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
I really always appreciate you guys. I have one question as to my understanding.
Is this correct? |
|
@crcrpar Almost!
|
colesbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but I'm not very familiar with the spectral norm code feel a bit lost.
If @t-vi has time and is familiar with this, his review may be helpful.
test/test_nn.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_nn.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
Sorry for my late response. Changes are good but I have some questions about test codes. |
|
Thank you for your comments @crcrpar :) |
|
I am still having a problem with spectral norm complaining about in place changes when followed by batch normalization and distributed across 2 or more GPUs using data parallel. This architecture arises in Big GAN (https://arxiv.org/abs/1809.11096). Interestingly this is not an issue with group normalization, instance normalization, or no normalization. It is also not an issue on a single GPU. |
|
So to get this a bit sorted, how many cases do we have? with/without DP * eval/training * weight.requires_grad=True/False * ???. |
|
@zmurez even with this patch? |
|
@t-vi Yeah, I think those 8 cases are all we need to consider. |
I think so. It complained regardless of normalization and number of GPU's prior to adding this patch. However, I just copied the relevant lines into my own spectral_norm.py file instead of pulling the entire branch... So it is possible I missed something... but it seems unlikely since all the other cases are fixed. Note, I am still using the latest stable release. To get this patch to work I also grabbed a copy of the normalize function, and implemented my on chain_matmul (a single if statement in this case with 3 matrices). Does this bug exist for you? If not I guess I will have to consider updating to the unstable version. Thanks |
|
@zmurez It is possible that other changes are needed. Do you have a small repro script I can try on my build? |
|
t-vi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding all the new tests! They seem to be comprehensive and I think the patch is good with them.
torch/nn/utils/spectral_norm.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssnl is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
It seems that when spectral norm is applied to a conv module, u is randomly initialized. Then v is computed such that the invariant holds.
|
|
@zmurez Good point. I'll think about ways to fix it. |
|
For the initialization I think it should be not terribly important whether to init u or v randomly, so that could likely be changed. For loading the state dict: this only happens when there you load an "old version" state dict. How about warning about the performance and (potentially) offering a switch to forward re-compute u instead of solving for v. (I think it might be save to "do the right, slow thing" by default rather than the other way round.) |
|
I agree with t-vi. Random initialization of |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssnl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssnl is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: In `broadcast_coalesced`, since multiple variables can be "views" of a big flattened tensor, they can share the same version counter. However, this base flat tensor is not exposed and they don't share any memory locations, so this is not necessary. Furthermore, it can cause problems, e.g., when two buffers are broadcast together in `DataParallel` and one of them is modified in-place during `forward` but the other is needed in backward, autograd engine will complain. Fixing the bug discovered at #13350 (comment) edit: This is a very real problem. E.g., consider using Spectral Norm + Batch Norm together. Pull Request resolved: #13594 Differential Revision: D12967311 Pulled By: SsnL fbshipit-source-id: 52998dbabe149f575cf0fb79e7016f0b95e4b9e5
Problems with SN and DP after #12671 :
in eval mode,
weight_origis not getting correct gradient SpectralNorm in eval doesn't connect grad to weight_orig #12737 .Fix: keep
vvector around as a buffer and always calculateW = W_orig / (u @ W_orig @ v)even in eval.in training mode, the
weightbuffer of the parallelized module is never updated, if someone touchesweight_origand/orweightand makes them not sharing storage. So inevalthe weight used is wrong.Fix: Make
weightnot a buffer anymore and always calculate it as above.Fix SpectralNorm with DataParallel #12671 changed SN to update
uin-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN lossD(real) - D(fake)) because the vectors needed to backprop the 1st forward is changed in the 2nd forward.Fix: This PR clones
uandvbefore using them.To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done.
cc @t-vi @crcrpar