Group Normalization #5968

ssnl · 2018-03-23T16:31:32Z

Implements group normalization.

test/test_nn.py

torch/nn/functional.py

ssnl · 2018-03-23T16:52:08Z

cc @KaimingHe

torch/nn/functional.py

ssnl · 2018-03-23T19:02:54Z

Right. However, here we actually want it to be contiguous to take advantage of the cudnn path. Reshape doesn’t always make it contiguous.

…

On Fri, Mar 23, 2018 at 14:59 Francisco Massa ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In torch/nn/functional.py <#5968 (comment)>: > + raise ValueError('Expected number of channels in input to be divisible ' + 'by num_groups, but got {} input and num_groups={}' + .format(input_shape, num_groups)) + + if weight is not None and (weight.dim() != 1 or weight.numel() != c): + raise ValueError('Expected weight to be a vector of size equal to the ' + 'number of channels in input, but got {} weight and {} ' + 'input'.format(weight.size(), input_shape)) + + if bias is not None and (bias.dim() != 1 or bias.numel() != c): + raise ValueError('Expected bias to be a vector of size equal to the ' + 'number of channels in input, but got {} bias and {} ' + 'input'.format(bias.size(), input_shape)) + + # Apply group norm + input_reshaped = input.contiguous().view(1, b * g, -1) nit: this can be simplified with .reshape(1, b * g, -1), which was added in #5575 <#5575> and consists roughly as .contiguous().view() — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5968 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFaWZfssR6F0bmL77IJKzDYzMTR_9a_jks5thUYLgaJpZM4S5Bef> .

torch/nn/functional.py

ssnl · 2018-03-23T20:30:52Z

Fair. Let me make it in ATen.

…

On Fri, Mar 23, 2018 at 16:30 Edward Z. Yang ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In torch/nn/functional.py <#5968 (comment)>: > @@ -1308,6 +1318,53 @@ def layer_norm(input, normalized_shape, running_mean, running_var, return out +def group_norm(input, num_groups, weight=None, bias=None, eps=1e-5): Why not in ATen? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5968 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFaWZZaRyyWs7ZFKViheK4m8a7lpP171ks5thVtMgaJpZM4S5Bef> .

ezyang · 2018-03-23T20:31:18Z

Looks good, but I did not do a math check. @KaimingHe, maybe you can look at it?

ssnl · 2018-03-23T21:40:34Z

@ezyang Moved the code to ATen :). Now one of the tests runs 2x as fast. I should move LN and IN to Aten as well.

ezyang · 2018-03-24T00:16:26Z

@pytorchbot retest this please

KaimingHe · 2018-03-24T16:07:36Z

Looks good for me.

* upstream/master: (663 commits) Fix "command not found" error in perf test (pytorch#5982) add pip mkl-devel to the error message when mkl is found but mkl headers are not (pytorch#5984) Support batch LowerCholeskyTransform (pytorch#5980) Linearly interpolating upsampling fix (pytorch#5927) Store perf numbers in S3 (pytorch#5951) Modidy setup docs for Windows (pytorch#5981) Group Normalization (pytorch#5968) [distributions] Implement Power transform (pytorch#5976) Disable TestBottleneck test_cuda on Windows (pytorch#5977) Fix crash when cat-ing empty cuda tensors (pytorch#5971) Update no_unions flag for nanopb gen and update ONNX proto files (pytorch#5972) Expose gradients w.r.t. input & weight for conv1d, conv2d, conv3d in Python (pytorch#5408) Fixed non-determinate preprocessing on DataLoader (pytorch#4640) add AVX2 implementation for sigmoid function (pytorch#5010) Implement torch.util.bottleneck (pytorch#5216) Remove pragma once from cpp file (pytorch#5965) fix mvn docs (pytorch#5967) Fix incorrect rendering of Tensor.index_*_ doc examples. (pytorch#5969) Implement range for loop in script (pytorch#5827) Add windows doc (pytorch#5859) ... # Conflicts: # aten/src/TH/generic/THTensorMath.c # torch/_tensor_docs.py # torch/csrc/generic/methods/TensorCompare.cwrap

ppwwyyxx · 2018-04-23T22:29:51Z

BTW I was able to reproduce our ImageNet experiments in Pytorch.
I used fewer augmentations (those in official pytorch examples) than our original experiments, and got 24.1 and 24.2 val error in two runs. The number reported in the paper is 24.0~24.2.

soumith · 2018-04-23T22:32:55Z

@ppwwyyxx that's awesome!