Skip to content

Conversation

@csarofeen
Copy link
Contributor

Batch norm layer that keeps parameters in 32-bit when using 16-bit input/output. This layer is necessary for successful pseudo fp16 training.

Copy link
Contributor

@soumith soumith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for introducing nn.contrib. Long-standing needed change.

@ngimel
Copy link
Collaborator

ngimel commented Aug 29, 2017

It generates unused variable warnings, because now all cudnn routines call PyObject* accumTensorClass = getAccumTensorClass(args); even though only batch norm uses differently typed accum tensors. Could not figure out how to get around it, would appreciate if someone gave me pointers how to do that in cwrap.

@csarofeen
Copy link
Contributor Author

Removed BN layer in torch/nn left backend in place.

.gitmodules Outdated

This comment was marked as off-topic.

This comment was marked as off-topic.

@apaszke
Copy link
Contributor

apaszke commented Nov 10, 2017

I'm not sure what's going on in this change. It's changing the BatchNorm code path in C++, but stats will still be half on the Python side? Also, it's going to conflict with @ezyang's changes (to no longer use THVoidTensor)

@soumith
Copy link
Contributor

soumith commented Nov 10, 2017

it's a rebased branch to get Zach going with Volta fp16.
After ed's changes all go in, this can be rebased on top.

@apaszke
Copy link
Contributor

apaszke commented Nov 10, 2017

Yeah, but this is lacking Python changes. I can't see how will it not error out when you try to use BatchNorm

@ngimel
Copy link
Collaborator

ngimel commented Nov 10, 2017

@apaszke, It does not touch current batch norm in any way, just allows to call cudnn batch norm with fp16 inputs from a custom python module. It will have to be changed after Ed's changes go in.

csarofeen added a commit to csarofeen/examples that referenced this pull request Nov 10, 2017
@csarofeen
Copy link
Contributor Author

Replaced with #4021 due to cuDNN rewrite in aten.

@csarofeen csarofeen closed this Dec 5, 2017
@csarofeen csarofeen deleted the BNfp16 branch February 12, 2020 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants