Skip to content

Conversation

@jma127
Copy link
Contributor

@jma127 jma127 commented Mar 14, 2018

This is a proposed implementation of reset_running_stats, which I discussed with @soumith earlier today. In addition, it enables the use of the simple moving average for running stats in place of the exponential moving average, with a momentum=None setting.

The motivation behind these changes is to faithfully execute the moment estimation of the original batchnorm paper (line 10 in Algorithm 2). To do this, you would do:

bn_layer = BatchNorm(..., momentum=None)

# construct the rest of the network and train

bn_layer.reset_running_stats()
for i in range(num_bn_meanvar_batches):
    model.forward(get_new_batch())

bn_layer.eval()

# use the model

@goldsborough
Copy link
Contributor

@pytorchbot retest this please

@jma127
Copy link
Contributor Author

jma127 commented Mar 14, 2018

(for some possibly confusing chicken-scratch motivation of why CMA is needed, see https://www.overleaf.com/read/xydtqhmhmtdz)

@goldsborough
Copy link
Contributor

@pytorchbot retest this please

@goldsborough
Copy link
Contributor

goldsborough commented Mar 14, 2018

Our CI was having issues, but it now works again and the failure is on your side. Please fix your PR :) Looks like it's something in the jit.

@jma127
Copy link
Contributor Author

jma127 commented Mar 14, 2018

OK, wasn't sure on that -- thanks for confirming! Looks like I didn't update some JIT unit test expected value, fixed now.

@jma127
Copy link
Contributor Author

jma127 commented Mar 14, 2018

@pytorchbot retest this please

1 similar comment
@goldsborough
Copy link
Contributor

@pytorchbot retest this please

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good in general except for minor comments. after fixing and rebasing this should be mergeable.

This comment was marked as off-topic.

This comment was marked as off-topic.

@ssnl
Copy link
Collaborator

ssnl commented Mar 17, 2018

It would also be great if we can test the arithmetic average test_nn.py

@jma127
Copy link
Contributor Author

jma127 commented Mar 17, 2018

@pytorchbot retest this please

1 similar comment
@jma127
Copy link
Contributor Author

jma127 commented Mar 17, 2018

@pytorchbot retest this please

@ssnl
Copy link
Collaborator

ssnl commented Mar 17, 2018

It seems that there are gloo changes. Could you try git submodule update --init to resolve it?

@jma127
Copy link
Contributor Author

jma127 commented Mar 17, 2018

Whoops, yeah. There are also some failing tests -- I will debug and @mention you when ready.

@jma127
Copy link
Contributor Author

jma127 commented Mar 17, 2018

@pytorchbot retest this please

1 similar comment
@jma127
Copy link
Contributor Author

jma127 commented Mar 17, 2018

@pytorchbot retest this please

@jma127
Copy link
Contributor Author

jma127 commented Mar 17, 2018

@ssnl ready for another look!

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one minor nit. Otherwise looks great!

test/test_nn.py Outdated

This comment was marked as off-topic.

@jma127
Copy link
Contributor Author

jma127 commented Mar 18, 2018

@pytorchbot retest this please

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be merged if tests pass :) Thanks @jma127 !

@goldsborough
Copy link
Contributor

goldsborough commented Mar 18, 2018

@jma127 is the CI not working? Tests start automatically (within ~10 secs) every time you push new commits (i.e. Usually no need to use pytorchbot) -- just FYI

@jma127
Copy link
Contributor Author

jma127 commented Mar 18, 2018

Oh OK, good to know :P

@jma127
Copy link
Contributor Author

jma127 commented Mar 18, 2018

One test is giving a 404 right now so I'll restart again.

@jma127
Copy link
Contributor Author

jma127 commented Mar 18, 2018

@pytorchbot retest this please

@jma127
Copy link
Contributor Author

jma127 commented Mar 18, 2018

Hmm, now I get:

04:51:16 Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Delete http://%2Fvar%2Frun%2Fdocker.sock/v1.32/containers/4631e92d8a4fd7a9deca52381a4024a52713bcd3979b6425abdff43d7fa91feb?force=1: dial unix /var/run/docker.sock: connect: permission denied

Will retest one more time -- if it still persists, @goldsborough could you investigate?

@jma127
Copy link
Contributor Author

jma127 commented Mar 18, 2018

@pytorchbot retest this please

@goldsborough
Copy link
Contributor

goldsborough commented Mar 18, 2018

Sure. Cc @ezyang (our CI wizard)

@ezyang
Copy link
Contributor

ezyang commented Mar 18, 2018

@pytorchbot retest this please

@goldsborough
Copy link
Contributor

Let's merge it

@soumith soumith merged commit 99b1f6c into pytorch:master Mar 19, 2018
bddppq added a commit to onnxbot/onnx-fb-universe that referenced this pull request Mar 19, 2018
soumith added a commit that referenced this pull request Mar 19, 2018
soumith added a commit that referenced this pull request Mar 19, 2018
…simple") moving average" (#5892)

* Revert "Port ATen and JIT C++ tests to Catch2 (#5788)"

This reverts commit 6f80023.

* Revert "Fix error message for cat-ing zero-dim tensors (#5819)"

This reverts commit cf2e176.

* Revert "Softmax symbolic should account for negative dim (#5846)"

This reverts commit ba64724.

* Revert "[fft][1 of 3] build system and helpers to support cuFFT and MKL (#5855)"

This reverts commit 22ef8e5.

* Revert "Don't modify requires_grad when running DataParallel in no_grad mode (#5880)"

This reverts commit d11b7fb.

* Revert "fix some methods not showing up in doc (#5882)"

This reverts commit 24fca0e.

* Revert "ReduceOps cleanup and set_num_threads (#5723)"

This reverts commit 84400d5.

* Revert "introduce shape_as_tensor and reshape_from_variable_shape (#5824)"

This reverts commit f446b82.

* Revert "Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#5766)"

This reverts commit 99b1f6c.
jekbradbury pushed a commit to jekbradbury/pytorch that referenced this pull request Mar 21, 2018
…simple") moving average" (pytorch#5892)

* Revert "Port ATen and JIT C++ tests to Catch2 (pytorch#5788)"

This reverts commit 6f80023.

* Revert "Fix error message for cat-ing zero-dim tensors (pytorch#5819)"

This reverts commit cf2e176.

* Revert "Softmax symbolic should account for negative dim (pytorch#5846)"

This reverts commit ba64724.

* Revert "[fft][1 of 3] build system and helpers to support cuFFT and MKL (pytorch#5855)"

This reverts commit 22ef8e5.

* Revert "Don't modify requires_grad when running DataParallel in no_grad mode (pytorch#5880)"

This reverts commit d11b7fb.

* Revert "fix some methods not showing up in doc (pytorch#5882)"

This reverts commit 24fca0e.

* Revert "ReduceOps cleanup and set_num_threads (pytorch#5723)"

This reverts commit 84400d5.

* Revert "introduce shape_as_tensor and reshape_from_variable_shape (pytorch#5824)"

This reverts commit f446b82.

* Revert "Enable resetting of batchnorm running moments and cumulative ("simple") moving average (pytorch#5766)"

This reverts commit 99b1f6c.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants