Enable resetting of batchnorm running stats and cumulative ("simple") moving average #5766

jma127 · 2018-03-14T03:06:56Z

This is a proposed implementation of reset_running_stats, which I discussed with @soumith earlier today. In addition, it enables the use of the simple moving average for running stats in place of the exponential moving average, with a momentum=None setting.

The motivation behind these changes is to faithfully execute the moment estimation of the original batchnorm paper (line 10 in Algorithm 2). To do this, you would do:

bn_layer = BatchNorm(..., momentum=None)

# construct the rest of the network and train

bn_layer.reset_running_stats()
for i in range(num_bn_meanvar_batches):
    model.forward(get_new_batch())

bn_layer.eval()

# use the model

goldsborough · 2018-03-14T04:44:17Z

@pytorchbot retest this please

jma127 · 2018-03-14T05:08:50Z

(for some possibly confusing chicken-scratch motivation of why CMA is needed, see https://www.overleaf.com/read/xydtqhmhmtdz)

goldsborough · 2018-03-14T05:27:53Z

@pytorchbot retest this please

goldsborough · 2018-03-14T05:45:10Z

Our CI was having issues, but it now works again and the failure is on your side. Please fix your PR :) Looks like it's something in the jit.

jma127 · 2018-03-14T05:49:06Z

OK, wasn't sure on that -- thanks for confirming! Looks like I didn't update some JIT unit test expected value, fixed now.

jma127 · 2018-03-14T05:49:45Z

@pytorchbot retest this please

goldsborough · 2018-03-14T05:52:38Z

@pytorchbot retest this please

ssnl

looks good in general except for minor comments. after fixing and rebasing this should be mergeable.

torch/nn/modules/batchnorm.py

ssnl · 2018-03-17T01:45:15Z

It would also be great if we can test the arithmetic average test_nn.py

jma127 · 2018-03-17T02:10:38Z

@pytorchbot retest this please

jma127 · 2018-03-17T02:32:33Z

@pytorchbot retest this please

ssnl · 2018-03-17T02:45:20Z

It seems that there are gloo changes. Could you try git submodule update --init to resolve it?

jma127 · 2018-03-17T02:46:30Z

Whoops, yeah. There are also some failing tests -- I will debug and @mention you when ready.

jma127 · 2018-03-17T02:54:38Z

@pytorchbot retest this please

jma127 · 2018-03-17T05:05:28Z

@pytorchbot retest this please

jma127 · 2018-03-17T06:16:25Z

@ssnl ready for another look!

ssnl

Only one minor nit. Otherwise looks great!

test/test_nn.py

…") moving average

jma127 · 2018-03-18T03:38:40Z

@pytorchbot retest this please

ssnl

this can be merged if tests pass :) Thanks @jma127 !

goldsborough · 2018-03-18T03:41:40Z

@jma127 is the CI not working? Tests start automatically (within ~10 secs) every time you push new commits (i.e. Usually no need to use pytorchbot) -- just FYI

jma127 · 2018-03-18T03:52:50Z

Oh OK, good to know :P

jma127 · 2018-03-18T04:40:31Z

One test is giving a 404 right now so I'll restart again.

jma127 · 2018-03-18T04:40:38Z

@pytorchbot retest this please

jma127 · 2018-03-18T04:55:51Z

Hmm, now I get:

04:51:16 Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.32/containers/4631e92d8a4fd7a9deca52381a4024a52713bcd3979b6425abdff43d7fa91feb?force=1: dial unix /var/run/docker.sock: connect: permission denied

Will retest one more time -- if it still persists, @goldsborough could you investigate?

jma127 · 2018-03-18T04:55:57Z

@pytorchbot retest this please

goldsborough · 2018-03-18T04:57:32Z

Sure. Cc @ezyang (our CI wizard)

ezyang · 2018-03-18T05:15:22Z

@pytorchbot retest this please

goldsborough · 2018-03-18T06:14:43Z

Let's merge it

pytorch/pytorch#5766

…("simple") moving average (#5766)" This reverts commit 99b1f6c.

…simple") moving average" (#5892) * Revert "Port ATen and JIT C++ tests to Catch2 (#5788)" This reverts commit 6f80023. * Revert "Fix error message for cat-ing zero-dim tensors (#5819)" This reverts commit cf2e176. * Revert "Softmax symbolic should account for negative dim (#5846)" This reverts commit ba64724. * Revert "[fft][1 of 3] build system and helpers to support cuFFT and MKL (#5855)" This reverts commit 22ef8e5. * Revert "Don't modify requires_grad when running DataParallel in no_grad mode (#5880)" This reverts commit d11b7fb. * Revert "fix some methods not showing up in doc (#5882)" This reverts commit 24fca0e. * Revert "ReduceOps cleanup and set_num_threads (#5723)" This reverts commit 84400d5. * Revert "introduce shape_as_tensor and reshape_from_variable_shape (#5824)" This reverts commit f446b82. * Revert "Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#5766)" This reverts commit 99b1f6c.

…simple") moving average" (pytorch#5892) * Revert "Port ATen and JIT C++ tests to Catch2 (pytorch#5788)" This reverts commit 6f80023. * Revert "Fix error message for cat-ing zero-dim tensors (pytorch#5819)" This reverts commit cf2e176. * Revert "Softmax symbolic should account for negative dim (pytorch#5846)" This reverts commit ba64724. * Revert "[fft][1 of 3] build system and helpers to support cuFFT and MKL (pytorch#5855)" This reverts commit 22ef8e5. * Revert "Don't modify requires_grad when running DataParallel in no_grad mode (pytorch#5880)" This reverts commit d11b7fb. * Revert "fix some methods not showing up in doc (pytorch#5882)" This reverts commit 24fca0e. * Revert "ReduceOps cleanup and set_num_threads (pytorch#5723)" This reverts commit 84400d5. * Revert "introduce shape_as_tensor and reshape_from_variable_shape (pytorch#5824)" This reverts commit f446b82. * Revert "Enable resetting of batchnorm running moments and cumulative ("simple") moving average (pytorch#5766)" This reverts commit 99b1f6c.

ssnl requested changes Mar 17, 2018

View reviewed changes

torch/nn/modules/batchnorm.py Outdated

This comment was marked as off-topic.

Sign in to view

torch/nn/modules/batchnorm.py Outdated

This comment was marked as off-topic.

Sign in to view

ssnl reviewed Mar 17, 2018

View reviewed changes

test/test_nn.py Outdated

This comment was marked as off-topic.

Sign in to view

Enable resetting of batchnorm running moments and cumulative ("simple…

1ba0ccf

…") moving average

ssnl approved these changes Mar 18, 2018

View reviewed changes

soumith merged commit 99b1f6c into pytorch:master Mar 19, 2018

bddppq added a commit to onnxbot/onnx-fb-universe that referenced this pull request Mar 19, 2018

Update test expect files

43184bf

pytorch/pytorch#5766

bddppq mentioned this pull request Mar 19, 2018

Update test expect files onnxbot/onnx-fb-universe#1193

Closed

soumith mentioned this pull request Mar 19, 2018

BatchNorm checkpoints are broken with the latest master #5881

Closed

soumith added a commit that referenced this pull request Mar 19, 2018

Revert "Enable resetting of batchnorm running moments and cumulative …

d30e280

…("simple") moving average (#5766)" This reverts commit 99b1f6c.

soumith mentioned this pull request Mar 19, 2018

Revert "Enable resetting of batchnorm running stats and cumulative ("simple") moving average" #5892

Merged

jma127 mentioned this pull request Apr 9, 2018

Enable resetting of batchnorm running moments and cumulative average #6445

Merged

Enable resetting of batchnorm running stats and cumulative ("simple") moving average #5766

Enable resetting of batchnorm running stats and cumulative ("simple") moving average #5766

Uh oh!

Conversation

jma127 commented Mar 14, 2018

Uh oh!

goldsborough commented Mar 14, 2018

Uh oh!

jma127 commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goldsborough commented Mar 14, 2018

Uh oh!

goldsborough commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jma127 commented Mar 14, 2018

Uh oh!

jma127 commented Mar 14, 2018

Uh oh!

goldsborough commented Mar 14, 2018

Uh oh!

ssnl left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ssnl commented Mar 17, 2018

Uh oh!

jma127 commented Mar 17, 2018

Uh oh!

jma127 commented Mar 17, 2018

Uh oh!

ssnl commented Mar 17, 2018

Uh oh!

jma127 commented Mar 17, 2018

Uh oh!

jma127 commented Mar 17, 2018

Uh oh!

jma127 commented Mar 17, 2018

Uh oh!

jma127 commented Mar 17, 2018

Uh oh!

ssnl left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

jma127 commented Mar 18, 2018

Uh oh!

ssnl left a comment

Choose a reason for hiding this comment

Uh oh!

goldsborough commented Mar 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jma127 commented Mar 18, 2018

Uh oh!

jma127 commented Mar 18, 2018

Uh oh!

jma127 commented Mar 18, 2018

Uh oh!

jma127 commented Mar 18, 2018

Uh oh!

jma127 commented Mar 18, 2018

Uh oh!

goldsborough commented Mar 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Mar 18, 2018

Uh oh!

goldsborough commented Mar 18, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

jma127 commented Mar 14, 2018 •

edited

Loading

goldsborough commented Mar 14, 2018 •

edited

Loading

goldsborough commented Mar 18, 2018 •

edited

Loading

goldsborough commented Mar 18, 2018 •

edited

Loading