Skip to content

Conversation

@ssnl
Copy link
Collaborator

@ssnl ssnl commented Jan 30, 2018

Commits:

  1. Renames ATen/Check.h to ATen/TensorUtils.h with an additional method at::maybe_data_ptr added. Make UndefinedTensor's data_ptr() return nullptr #4851
  2. THNN BN code so that running_mean and running_var can be optional when training=True. Feature request: instance norm without the computation of running statistics #4509
  3. ATen changes and cuDNN changes. Still Feature request: instance norm without the computation of running statistics #4509
  4. Python nn.* changes, including changing InstanceNorm*d's use_running_stats from Fix setting using running stats in InstanceNorm*d #4444 to the new option track_running_stats on BN. Improves IN and BN docs.
  5. Adds test for the new option for IN and BN. Improves other IN tests.
  6. Adds Layer Normalization [Feature Request] Layer Normalization #1959 .
  7. Fixes LRN doc.
  8. Functional interface for IN and LN.
  9. Tests for LN.
  10. Fix BN double backward returning undefined tensor when it shouldn't.
  11. Fix Jit tests that use wrong dim inputs for BN
  12. Add/Improve BN, IN and LN GPU tests with half.
  13. Update IN BN docs to be consistent with conv notation; Fix onnx failures.

@pytorchbot
Copy link
Collaborator

@ssnl, thanks for your PR! We identified @zdevito to be a potential reviewer.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@Kaixhin
Copy link
Contributor

Kaixhin commented Jan 30, 2018

Sorry but could you also fix the LRN docs with this PR? Just need to remove the extra colon here.

This was referenced Jan 30, 2018
@ssnl ssnl force-pushed the layer_norm branch 2 times, most recently from f989db0 to c68b7e5 Compare January 30, 2018 17:47
@ssnl ssnl force-pushed the layer_norm branch 3 times, most recently from 142707f to 1f3df3d Compare January 30, 2018 21:23
@ssnl ssnl changed the title [WIP] Layer Normalization [ready] Layer Normalization Jan 30, 2018
@ssnl ssnl changed the title [ready] Layer Normalization [WIP] Layer Normalization Jan 30, 2018
@ssnl ssnl force-pushed the layer_norm branch 12 times, most recently from 45b8220 to 9beb186 Compare January 31, 2018 20:15
@ssnl
Copy link
Collaborator Author

ssnl commented Jan 31, 2018

@pytorchbot retest this please

1 similar comment
@ssnl
Copy link
Collaborator Author

ssnl commented Jan 31, 2018

@pytorchbot retest this please

@ssnl
Copy link
Collaborator Author

ssnl commented Feb 1, 2018

@pytorchbot retest this please

self.assertEqual(grad1, grad2)

# track_running_stats=False
module = nn.BatchNorm1d(3, track_running_stats=False).type(test_type)

This comment was marked as off-topic.

This comment was marked as off-topic.

"""Applies instance normalization over an input. The implementation is
based on batch_norm, in which we do reshape, batchnorm, and reshape again.
def batch_norm(input, running_mean, running_var, weight=None, bias=None,
training=False, momentum=0.1, eps=1e-5):

This comment was marked as off-topic.

This comment was marked as off-topic.

Args:
num_features: num_features from an expected input of size
`[batch_size x num_features (x width)]`
:math:`(N, C, L)` or :math:`(N, L)`

This comment was marked as off-topic.

Args:
num_features: num_features from an expected input of size
`[batch_size x num_features x width]`
:math:`(N, C, L)` or :math:`(N, L)`

This comment was marked as off-topic.

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@soumith soumith merged commit 1848cad into pytorch:master Feb 22, 2018
@ssnl ssnl deleted the layer_norm branch February 22, 2018 16:57
@ruotianluo
Copy link
Contributor

ruotianluo commented Feb 22, 2018

@ssnl
Copy link
Collaborator Author

ssnl commented Feb 22, 2018

@ruotianluo definitely not. This phenomenon is not unique to these two classes through. I'll figure out what's wrong.

@meder411
Copy link

meder411 commented Feb 23, 2018

Is there a particular reason that the normalized_shape input must be a tensor? For fully connected layers or other 1D inputs, it seems that the constructor ought to be able to accept an integer. As of now, I am using

nn.LayerNorm(torch.LongTensor([256])

but why not

nn.LayerNorm(256)

as one would with BatchNorm. From the paper, there doesn't seem to be any reason you wouldn't use Layer Normalization in this way. It's a minor change, but it may be a more seemless interface.

@ssnl
Copy link
Collaborator Author

ssnl commented Feb 23, 2018

@meder411 It doesn't need to be a tensor. It can be a list, tuple, torch.Size, etc., basically anything that can be accepted by torch.Size's constructor. In fact, the doc never suggested passing in a tensor, so I'm not sure where you get that idea.

The idea is to be able to normalize over multiple dimensions. Hence the argument is called normalized_shape. The doc explains it pretty clearly I think.

@Kaixhin
Copy link
Contributor

Kaixhin commented Feb 23, 2018

@meder411 While the docs do look correct to me for describing normalized_shape, having it accept an integer for the most common use-case by far does seem worthwhile. If you're willing to submit a PR for this that'd be great, otherwise raise an issue (please tag me within) and someone will address it.

self.running_mean.zero_()
self.running_var.fill_(1)
if self.elementwise_affine:
self.weight.data.uniform_()

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

jamesr66a pushed a commit to jamesr66a/pytorch that referenced this pull request Feb 23, 2018
* at::maybe_data_ptr and Check.h => TensorUtils.h

* THNN support for optional BN running_*

* ATen support for optional BN running_*

* Python nn.* support for optional BN running_*; Improve IN and BN doc

* Add tests for IN and BN new option

* Layer Norm

* Fix LRN doc

* functional interface for LN and IN

* Layer norm tests

* fix BN double backward returning undefined tensors

* fix jit test using wrong dim inputs for BN

* add/improve BN, IN and LN GPU tests with half type

* Udpate docs to be consistent with Conv notation
Fix onnx
Clarified onnx symbokic wrapper

* fix typo

* Address comments
@JustinLin610
Copy link

How can I use it with nn.LSTM or nn.GRU? it will be much more helpful if LN can be used with the two classes

@ssnl
Copy link
Collaborator Author

ssnl commented Mar 30, 2018

@JustinLin610 Using the *Cell classes is the only way.

@calclavia
Copy link

@ssnl I don't think there's a way to modify the cell? The only way would be to copy and paste the code for the nn.GRUCell and then add layernorm manually. Would be nice if there's an option to easily enable it...

@jinserk
Copy link

jinserk commented Sep 5, 2018

I'd like to apply LayerNorm to multi-layer LSTM but have no idea how to use LSTMCell classes. @ssnl, could you let me know any simple example for this? LayerNorm is quite new to PyTorch so it is difficult to find a good example for this..

@MBAnslow
Copy link

MBAnslow commented Apr 9, 2019

@ssnl I agree with jinserk. I've looked far and wide for a feature full example of LSTMs and GRUs that support layer normalisation including bidirectionality and multiple layers and I haven't really found anything. There are some incomplete LSTM versions around but they don't seem to be feature complete or optimised very well.

@zou3519
Copy link
Contributor

zou3519 commented Apr 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.