Allow autograd to work even when the shape of values cannot be determined #8641

zdevito · 2018-06-19T05:32:22Z

This commit implements the solution proposed in #8410
to workaround the need to create zero tensors with the same shape as inputs.
It introduces the concept of a LinearBlock which marks places in the code
where we know if all the inputs to the node are zero, then the outputs
to the node are also zero. Autodiff introduces LinearBlocks around
backwards functions, which have this property. specializeUndef then
propagates Undef nodes using this information.

Notes:

Since we do not always specialize, we have a pass LowerLinearBlocks
that replaces the block with an if statement that dynamically guards
the Undef case.
We introduce AutogradAdd which is addition that still works when
its inputs might be undefined. In cases where we specialize this will
get removed in favor of a normal add, but there are cases where
gradient graphs do not specialize (e.g. when they are not differentiable,
but a derivative is required) so it is important for this op to be executable.

zdevito · 2018-06-20T20:22:23Z

@apaszke @ezyang When you get a chance, this switches the derivatives as we discussed.

apaszke · 2018-06-20T20:54:41Z

Yep, I'll make sure to get back to you with a review tomorrow.

torch/csrc/jit/autodiff.cpp

test/test_jit.py

torch/csrc/jit/graph_executor.cpp

torch/csrc/jit/passes/lower_linear_block.cpp

ezyang

Looks great, very clear.

apaszke

Looks great, and definitely simplifies the autodiff code!

torch/csrc/jit/autodiff.cpp

torch/csrc/jit/graph_executor.cpp

torch/csrc/jit/passes/lower_linear_block.cpp

torch/csrc/jit/graph_executor.cpp

…ined This commit implements the solution proposed in pytorch#8410 to workaround the need to create zero tensors with the same shape as inputs. It introduces the concept of a LinearBlock which marks places in the code where we know if all the inputs to the node are zero, then the outputs to the node are also zero. Autodiff introduces LinearBlocks around backwards functions, which have this property. specializeUndef then propagates Undef nodes using this information. Notes: * Since we do not always specialize, we have a pass LowerLinearBlocks that replaces the block with an if statement that dynamically guards the Undef case. * We introduce AutogradAdd which is addition that still works when its inputs might be undefined. In cases where we specialize this will get removed in favor of a normal add, but there are cases where gradient graphs do not specialize (e.g. when they are not differentiable, but a derivative is required) so it is important for this op to be executable.

* upstream/master: (42 commits) [c10d] No default device for ProcessGroupGloo (pytorch#8888) Fix default values for affine= in the docstrings of InstanceNormXd (pytorch#8895) Stop making dynamic allocations of PinnedMemoryAllocator. (pytorch#8896) [C++ API] Rework optimization package (pytorch#8815) Mention MPICH_MAX_THREAD_SAFETY=multiple. (pytorch#8580) Unify isViewable, handle n-dimensional empty tensors. (pytorch#8883) Add pos_weight argument to nn.BCEWithLogitsLoss (pytorch#5660) (pytorch#6856) [build] Enable clang-specific warnings only when using clang (pytorch#8869) Fix cmake cudnn autodetection (pytorch#8891) [c10d] Fix link order for building C++ tests (pytorch#8889) directly add_subdirectory(nanopb) from torch CMakeLists (pytorch#8870) [C++ API] Bag of fixes (pytorch#8843) [build] Raise in cmake when seeing NVCC{9/9.1} + GCC6 combo (pytorch#8863) Create avg_pool1d in ATen (pytorch#8880) throw error when grid_sample is passed unsupported mode (pytorch#8884) Allow autograd to work even when the shape of values cannot be determined (pytorch#8641) Make at::Tensor::to() const (pytorch#8839) [auto] Update onnx to 458c521 - Fix typo (onnx/onnx#1143) onnx/onnx@458c521 [Caffe2] Fix gradient_check on in-place ops (pytorch#8828) Fix as_strided_backward (pytorch#8721) ...

…ined (pytorch#8641) This commit implements the solution proposed in pytorch#8410 to workaround the need to create zero tensors with the same shape as inputs. It introduces the concept of a LinearBlock which marks places in the code where we know if all the inputs to the node are zero, then the outputs to the node are also zero. Autodiff introduces LinearBlocks around backwards functions, which have this property. specializeUndef then propagates Undef nodes using this information. Notes: * Since we do not always specialize, we have a pass LowerLinearBlocks that replaces the block with an if statement that dynamically guards the Undef case. * We introduce AutogradAdd which is addition that still works when its inputs might be undefined. In cases where we specialize this will get removed in favor of a normal add, but there are cases where gradient graphs do not specialize (e.g. when they are not differentiable, but a derivative is required) so it is important for this op to be executable.

zdevito requested review from apaszke, colesbury, ezyang, gchanan and soumith as code owners June 19, 2018 05:32

zdevito force-pushed the pr/zero_free_derivatives branch 2 times, most recently from c23b9e7 to 3a4b65f Compare June 19, 2018 17:19