Skip to content

Conversation

@zdevito
Copy link
Contributor

@zdevito zdevito commented Jun 19, 2018

This commit implements the solution proposed in #8410
to workaround the need to create zero tensors with the same shape as inputs.
It introduces the concept of a LinearBlock which marks places in the code
where we know if all the inputs to the node are zero, then the outputs
to the node are also zero. Autodiff introduces LinearBlocks around
backwards functions, which have this property. specializeUndef then
propagates Undef nodes using this information.

Notes:

  • Since we do not always specialize, we have a pass LowerLinearBlocks
    that replaces the block with an if statement that dynamically guards
    the Undef case.
  • We introduce AutogradAdd which is addition that still works when
    its inputs might be undefined. In cases where we specialize this will
    get removed in favor of a normal add, but there are cases where
    gradient graphs do not specialize (e.g. when they are not differentiable,
    but a derivative is required) so it is important for this op to be executable.

@zdevito zdevito force-pushed the pr/zero_free_derivatives branch 2 times, most recently from c23b9e7 to 3a4b65f Compare June 19, 2018 17:19
@zdevito
Copy link
Contributor Author

zdevito commented Jun 20, 2018

@apaszke @ezyang When you get a chance, this switches the derivatives as we discussed.

@apaszke
Copy link
Contributor

apaszke commented Jun 20, 2018

Yep, I'll make sure to get back to you with a review tomorrow.

This comment was marked as off-topic.

This comment was marked as off-topic.

test/test_jit.py Outdated

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, very clear.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, and definitely simplifies the autodiff code!

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@zdevito zdevito force-pushed the pr/zero_free_derivatives branch 2 times, most recently from cf19324 to 7f3534b Compare June 25, 2018 19:14
…ined

This commit implements the solution proposed in pytorch#8410
to workaround the need to create zero tensors with the same shape as inputs.
It introduces the concept of a LinearBlock which marks places in the code
where we know if all the inputs to the node are zero, then the outputs
to the node are also zero. Autodiff introduces LinearBlocks around
backwards functions, which have this property. specializeUndef then
propagates Undef nodes using this information.

Notes:
* Since we do not always specialize, we have a pass LowerLinearBlocks
that replaces the block with an if statement that dynamically guards
the Undef case.
* We introduce AutogradAdd which is addition that still works when
its inputs might be undefined. In cases where we specialize this will
get removed in favor of a normal add, but there are cases where
gradient graphs do not specialize (e.g. when they are not differentiable,
but a derivative is required) so it is important for this op to be executable.
@zdevito zdevito force-pushed the pr/zero_free_derivatives branch from 7f3534b to e2b3828 Compare June 25, 2018 22:12
@zdevito zdevito merged commit f74207c into pytorch:master Jun 26, 2018
petrex pushed a commit to ROCm/pytorch that referenced this pull request Jun 26, 2018
* upstream/master: (42 commits)
  [c10d] No default device for ProcessGroupGloo (pytorch#8888)
  Fix default values for affine= in the docstrings of InstanceNormXd (pytorch#8895)
  Stop making dynamic allocations of PinnedMemoryAllocator. (pytorch#8896)
  [C++ API]  Rework optimization package (pytorch#8815)
  Mention MPICH_MAX_THREAD_SAFETY=multiple. (pytorch#8580)
  Unify isViewable, handle n-dimensional empty tensors. (pytorch#8883)
  Add pos_weight argument to nn.BCEWithLogitsLoss (pytorch#5660) (pytorch#6856)
  [build] Enable clang-specific warnings only when using clang (pytorch#8869)
  Fix cmake cudnn autodetection (pytorch#8891)
  [c10d] Fix link order for building C++ tests (pytorch#8889)
  directly add_subdirectory(nanopb) from torch CMakeLists (pytorch#8870)
  [C++ API] Bag of fixes (pytorch#8843)
  [build] Raise in cmake when seeing NVCC{9/9.1} + GCC6 combo (pytorch#8863)
  Create avg_pool1d in ATen (pytorch#8880)
  throw error when grid_sample is passed unsupported mode (pytorch#8884)
  Allow autograd to work even when the shape of values cannot be determined (pytorch#8641)
  Make at::Tensor::to() const (pytorch#8839)
  [auto] Update onnx to 458c521 - Fix typo (onnx/onnx#1143) onnx/onnx@458c521
  [Caffe2] Fix gradient_check on in-place ops (pytorch#8828)
  Fix as_strided_backward (pytorch#8721)
  ...
eellison pushed a commit to eellison/pytorch that referenced this pull request Jul 10, 2018
…ined (pytorch#8641)

This commit implements the solution proposed in pytorch#8410
to workaround the need to create zero tensors with the same shape as inputs.
It introduces the concept of a LinearBlock which marks places in the code
where we know if all the inputs to the node are zero, then the outputs
to the node are also zero. Autodiff introduces LinearBlocks around
backwards functions, which have this property. specializeUndef then
propagates Undef nodes using this information.

Notes:
* Since we do not always specialize, we have a pass LowerLinearBlocks
that replaces the block with an if statement that dynamically guards
the Undef case.
* We introduce AutogradAdd which is addition that still works when
its inputs might be undefined. In cases where we specialize this will
get removed in favor of a normal add, but there are cases where
gradient graphs do not specialize (e.g. when they are not differentiable,
but a derivative is required) so it is important for this op to be executable.
eellison pushed a commit to eellison/pytorch that referenced this pull request Jul 10, 2018
…ined (pytorch#8641)

This commit implements the solution proposed in pytorch#8410
to workaround the need to create zero tensors with the same shape as inputs.
It introduces the concept of a LinearBlock which marks places in the code
where we know if all the inputs to the node are zero, then the outputs
to the node are also zero. Autodiff introduces LinearBlocks around
backwards functions, which have this property. specializeUndef then
propagates Undef nodes using this information.

Notes:
* Since we do not always specialize, we have a pass LowerLinearBlocks
that replaces the block with an if statement that dynamically guards
the Undef case.
* We introduce AutogradAdd which is addition that still works when
its inputs might be undefined. In cases where we specialize this will
get removed in favor of a normal add, but there are cases where
gradient graphs do not specialize (e.g. when they are not differentiable,
but a derivative is required) so it is important for this op to be executable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants