Expose gradients w.r.t. input & weight for conv1d, conv2d, conv3d in Python #5408

vedanuj · 2018-02-26T10:24:15Z

This PR addresses issue #5024 and exposes in python gradients with respect to input and weight for conv2d and conv3d.

Gradients with respect to input of convolution (`F.grad.conv2d_input` , `F.grad.conv3d_input`) :

For this conv_transpose2d/conv_transpose3d are used which are essentially the gradients of conv2d/conv3d w.r.t. input. The output_padding required for conv_transpose2d/conv_transpose3d are implicitly calculated from shape of input gradient, shape of output gradient, kernel size, stride and padding.

Gradients with respect to weights (`F.grad.conv2d_weight` , `F.grad.conv3d_weight`):

For gradients with respect to the weight we use the conv2d/conv3d but with strides and dilations swapped. We reshape the output gradient and input in such a way that the output gradient can be treated as the weight of the forward convolution.

Please review @ezyang

torch/nn/modules/conv.py

soumith · 2018-02-27T23:30:31Z

So, the reservation from Adam is valid, but not in the way he expressed it.
What we want is to not increase the API surface to accommodate each of these "backward" calls.
Imagine having:

F.conv2d
F.conv2d_backward_input
F.conv2d_backward_weight
F.conv2d_backward_bias

I think the right design would be a separate lookup table that branches one step further. For example:

F.grad.conv2d_input
F.grad.conv2d_weight
F.grad.conv2d_bias

This way, all this extra API is encapsulated in the grad tree.

ezyang · 2018-02-28T05:11:42Z

grad is just a module, is that right? SGTM. @vedanuj, does this make sense?

vedanuj · 2018-02-28T05:21:42Z

@ezyang Yes makes sense. If I have understood correctly we should add all the gradient functions we will expose(now or in future) via a separate module called grad and add grad to functional.py so that we can enable something like F.grad.conv2d_input etc. Does that sound correct?

soumith · 2018-02-28T12:21:55Z

@vedanuj correct

apaszke · 2018-02-28T12:41:21Z

I thought about the namespace, and it's not a significantly better solution (the complexity remains the same), but I'm ok with it. One extra thing that we might want to think of before we start adding more and more functions there is that sometimes the derivative computations for a single op can share a lot of intermediates, and so we might need to redesign the API (to avoid having a function per computed gradient)

torch/nn/gradients.py

torch/nn/functional.py

torch/nn/gradients.py

ezyang · 2018-03-06T19:58:46Z

Needs tests.

vedanuj · 2018-03-11T04:00:39Z

The last commit added tests for all the grad functions and addresses other comments.

Tests:

python test/test_nn.py TestNN.test_grad_conv2d_input

python test/test_nn.py TestNN.test_grad_conv2d_weight

python test/test_nn.py TestNN.test_grad_conv3d_input

python test/test_nn.py TestNN.test_grad_conv3d_weight

@ezyang @apaszke Please re-review it.

ezyang · 2018-03-12T00:36:53Z

@pytorchbot test this please

test/test_nn.py

vedanuj · 2018-03-14T08:54:27Z

The commit 23a6b3a adds conv1d gradient w.r.t. input(F.grad.conv1d_input) and weight(F.grad.conv1d_weight) and refactors tests for all the conv grad functions.

vedanuj · 2018-03-14T08:54:49Z

@pytorchbot retest this please

ezyang · 2018-03-15T15:44:59Z

@pytorchbot retest this please

vedanuj · 2018-03-19T21:22:14Z

Please review @ezyang @apaszke

apaszke

LGTM, but I have some minor comments

torch/nn/functional.py

torch/nn/gradients.py

test/test_nn.py

torch/nn/gradients.py

vedanuj · 2018-03-23T18:57:04Z

New commits changes some names of parameters and removes use of deprecated Variable.

@apaszke TensorFlow APIs for the similar methods requires to input the full shape for input size and weight size. Should we keep it same as TF or change input size and weight size to only specify the dimensions like height, width, depth?

apaszke · 2018-03-23T20:32:57Z

No, if we're going to accept the shape then it should be the full shape. Let's leave it as is.

* upstream/master: (663 commits) Fix "command not found" error in perf test (pytorch#5982) add pip mkl-devel to the error message when mkl is found but mkl headers are not (pytorch#5984) Support batch LowerCholeskyTransform (pytorch#5980) Linearly interpolating upsampling fix (pytorch#5927) Store perf numbers in S3 (pytorch#5951) Modidy setup docs for Windows (pytorch#5981) Group Normalization (pytorch#5968) [distributions] Implement Power transform (pytorch#5976) Disable TestBottleneck test_cuda on Windows (pytorch#5977) Fix crash when cat-ing empty cuda tensors (pytorch#5971) Update no_unions flag for nanopb gen and update ONNX proto files (pytorch#5972) Expose gradients w.r.t. input & weight for conv1d, conv2d, conv3d in Python (pytorch#5408) Fixed non-determinate preprocessing on DataLoader (pytorch#4640) add AVX2 implementation for sigmoid function (pytorch#5010) Implement torch.util.bottleneck (pytorch#5216) Remove pragma once from cpp file (pytorch#5965) fix mvn docs (pytorch#5967) Fix incorrect rendering of Tensor.index_*_ doc examples. (pytorch#5969) Implement range for loop in script (pytorch#5827) Add windows doc (pytorch#5859) ... # Conflicts: # aten/src/TH/generic/THTensorMath.c # torch/_tensor_docs.py # torch/csrc/generic/methods/TensorCompare.cwrap

apaszke reviewed Feb 26, 2018

View reviewed changes

ezyang added the awaiting response (this tag is deprecated) This tag is deprecated while we figure out what to do with it label Mar 1, 2018

vedanuj changed the title ~~[WIP] Expose Conv2dBackward in python~~ [WIP] Expose gradients w.r.t. input/weight for conv2d, conv3d in Python Mar 6, 2018