Implement backward for pack_padded_sequence #4341

elanmart · 2017-12-24T04:01:23Z

On my machine this PR gives the following on CPU:

%timeit _ = rnn_utils.pack_padded_sequence(x, lengths).data.sum().backward()
%timeit _ = old_utils.pack_padded_sequence(x, lengths).data.sum().backward()

# 100 loops, best of 3: 5.75 ms per loop  # this PR
# 1 loop, best of 3: 194 ms per loop      # master

and GPU

%timeit _ = rnn_utils.pack_padded_sequence(x_cuda, lengths).data.sum().backward()
%timeit _ = old_utils.pack_padded_sequence(x_cuda, lengths).data.sum().backward()

# 1000 loops, best of 3: 1.67 ms per loop  # this PR
# 100 loops, best of 3: 11.5 ms per loop   # master

I've implemented this in python since I couldn't get ATen to accept the batch_sizes as input to backward.

There's an ugly conversion from a Variable to list when creating the PackedSequence tuple.

And also two tests are failing, because they assume that when pack_padded_sequence is called with Tensors, the PackedSequence.data will also be a Tensor, but here a Function converts it to Variable. Not sure how this should be handled.

Please let me know if this PR makes sense and if there's anything I could fix. Oh, and also if I should somehow concatenate these commits into a single one.

apaszke · 2017-12-29T11:36:18Z

@pytorchbot test this please

apaszke · 2018-01-01T21:49:10Z

I think the CI is failing because it used to work with Tensors, which is no longer the case. I don't think it's worth maintaining that support since it's not documented, and tensors are going to be merged with Variables before 0.4, so it's not a big deal. Can you please fix the tests?

elanmart · 2018-01-01T22:19:01Z

Hey Adam, I'm so sorry for the lack of response, I was on a short vacations without a proper internet access.
I've mentioned the issue with tests in the PR description

And also two tests are failing, because they assume that when pack_padded_sequence is called with Tensors, the PackedSequence.data will also be a Tensor, but here a Function converts it to Variable. Not sure how this should be handled.

If it's OK, I'll change these tests tomorrow.

apaszke · 2018-01-01T22:37:01Z

Sorry I forgot that you've already mentioned this! There's no hurry, take your time

* allow_inf on test_beta_log_prob * Support allow_inf on assertAlmostEqual Signed-off-by: Edward Z. Yang <ezyang@fb.com>

- as_variable no longer needs to be an instance function - mark functions as static

* Improve matmul native test tolerance. Because we don't directly use bmm in one case of matmul, a comparison to bmm doesn't make sense; instead, we compare to the double result. * Fix spelling.

Adds a missing bias term to the __repr__ functions of the Linear and Bilinear modules. Fixes the spacing in the Conv2d __repr__ to make it consistent with other modules.

* Support ATen GPU pointwise apply and torch.where. Like the CPU version, this implements an apply template that is almost identical to the apply template already in THC, but using the ATen API. Much of this involves stripping out the TensorUtils code (which is basically templated ATen-style), although a couple of functions remain that are apply specific (and thus don't seem worth porting to ATen), namely overlappingIndices, canUse32BitIndexMath, and getTensorInfo. We can make those generally available if there's a need. * Use int64_t instead of ptrdiff_t. * Use snake case for _copyIgnoringOverlaps_.

Currently 1-layer RNN is supported

…le_type (#4370)

* Delete obsolete basic ops. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * More deletion. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete some unused utilities. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete dead apply_fn Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete CppFunction symbolic support. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete ForwardFunction Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Batchnorm is 'working' Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* add gumbel_softmax, based on Eric Jang's implementation * Make gumbel_softmax CUDA friendly * gumbel_softmax tweaks

* Deprecate nn.NLLLoss2d * Fix legacy tests * Fix tests * Remove NLLLoss2d from docs, add deprecation warning instead of error * fix lint * Add more to docs

* Add test for empty Variable cat (forward only). * Test for empty cat (no grad/gradgrad checks) * Support gradcheck on empty inputs, check it for cat with an empty Variable. * Fix lint.

….cwrap. (#4479) The specification and logic aren't necessary anymore, it's fine to specify the default as nullptr.

BCELoss's outputs and gradInput computations are accurate to around 1e-6 on float types (as a relative value, not absolute), which is reasonable. However, the tests use absolute thresholds: the accumulation of 5 gradInputs has to have error less than 0.0002. The worse case for BCELoss's gradInput for each element may be described as 1 / ( (1-x) * x ). Previously, the input to the test was restricted to [0.02, 1- 0.02], resulting in worse-case largest gradInput of 50, resulting in a total accumulated grad of 50*5 = 250, resulting in an error of 250 * 1e-6 = 0.00025, which was too big. By restricting x to [0.028, 1- 0.028] we get a worse case of 36.74, resulting in a total accumulated grad of 184, which is less than the 200 needed to have error less than 0.0002.

This mismatched paren causes a syntax error in generated code. I'm guessing the parentheses are necessary, since there was one in there before, but I don't actually know whether the compiler can produce things like a - (b - c) that would make them required.

* Supporting logits as parameters in Bernoulli and Categorical * address comments * fix lint * modify binary_cross_entropy_with_logits * address comments * add descriptor for lazy attributes * address comments

Fixes #4386

Basically, scalars and implicitly unsqueezed.

…#4491)

- Out of bounds grads[2] access (thnn_conv_depthwise2d_backward doesn't compute bias gradient) - Groups was not set appropriately for depthwise convolution Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Weight can be non-contiguous due to double backwards, where we transpose the weight. I'm not very happy with this fix but it seems to make the tests pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

This test would have caught the OOB in thnn_conv_depthwise2d_backward Fixes #4457 Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Previously, we only tested CPU double-backwards, which is bad! This would have caught #4422 (still not fixed, so those tests are manually disabled) and also uncovered #4500 (not yet diagnosed.) Signed-off-by: Edward Z. Yang <ezyang@fb.com>

…tion Signed-off-by: Edward Z. Yang <ezyang@fb.com>

elanmart added 5 commits December 24, 2017 03:08

Add autograd Function for packing a padded sequence

2d0b986

Update utils/rnn.py to use dedicated Function

fda05e0

fix flake8 errors

87a8f99

Convert batch_sizes to list in pad_packed

37261cb

move batch_first check to PackPadded Function

f145ab6

elanmart changed the title ~~Pack padded function~~ Implement backward for pack_padded_sequence Dec 24, 2017

Update packing.py

47f34a3

apaszke approved these changes Dec 29, 2017

View reviewed changes

apaszke closed this Dec 30, 2017

apaszke reopened this Dec 30, 2017

apaszke closed this Jan 1, 2018

apaszke reopened this Jan 1, 2018

yongjik and others added 14 commits January 6, 2018 23:00

Improved documentation of several index operations.

e93522a

Minor changes to test utils to catch type errors (#4270)

c780ac8

Fix distribution tests due to merge order (#4351)

72b51b4

allow_inf on test_beta_log_prob (#4354)

a55a840

* allow_inf on test_beta_log_prob * Support allow_inf on assertAlmostEqual Signed-off-by: Edward Z. Yang <ezyang@fb.com>

VariableType clean-up (#4366)

6143710

- as_variable no longer needs to be an instance function - mark functions as static

Improve matmul native test tolerance. (#4365)

234814f

* Improve matmul native test tolerance. Because we don't directly use bmm in one case of matmul, a comparison to bmm doesn't make sense; instead, we compare to the double result. * Fix spelling.

add bias term to linear __repr__ functions, fix spacing

e8d6611

Adds a missing bias term to the __repr__ functions of the Linear and Bilinear modules. Fixes the spacing in the Conv2d __repr__ to make it consistent with other modules.

support RNN export (#4163)

dfddfb7

Currently 1-layer RNN is supported

Split off load_derivatives and gen_autograd_functions from gen_variab…

9bfc57e

…le_type (#4370)

Fix creating tensors with np.longlong array

0ee272c

Adding torch.expm1() and its inplace function (#4350)

d3c0471

fix some typos (#4379)

3aceb72

Implement OneHotCategorical distribution (#4357)

c6dec4c

ezyang and others added 27 commits January 6, 2018 23:00

- added size_splits to functional (#3837)

647177f

Add Tensor::print() for gdb use.

a460468

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

add gumbel_softmax, based on Eric Jang's implementation (#3341)

ac05af6

* add gumbel_softmax, based on Eric Jang's implementation * Make gumbel_softmax CUDA friendly * gumbel_softmax tweaks

Fix StepLR docs (#4478)

31912ef

Deprecate nn.NLLLoss2d (#4238)

96aa7c3

* Deprecate nn.NLLLoss2d * Fix legacy tests * Fix tests * Remove NLLLoss2d from docs, add deprecation warning instead of error * fix lint * Add more to docs

Hot patch ONNX _run_symbolic_function

619e861

Improvements around torch.cat on empty Variables (#3602)

f3cc582

* Add test for empty Variable cat (forward only). * Test for empty cat (no grad/gradgrad checks) * Support gradcheck on empty inputs, check it for cat with an empty Variable. * Fix lint.

Remove THPGenerator default code for random functions in Declarations…

79de7fb

….cwrap. (#4479) The specification and logic aren't necessary anymore, it's fine to specify the default as nullptr.

Implementation of Pareto Distribution (#4459)

a7a31de

disable CUDA HalfTensor tests in test_cuda for Windows (#4482)

ae80404

Declare constraints for distribution parameters and support (#4450)

4164630

Supporting logits as parameters in Bernoulli and Categorical (#4448)

f6b0610

* Supporting logits as parameters in Bernoulli and Categorical * address comments * fix lint * modify binary_cross_entropy_with_logits * address comments * add descriptor for lazy attributes * address comments

Fix handling of empty indices in CUDA Tensor.put_ (#4486)

773ef32

Fixes #4386

[ATen] have ger handle scalars like np.outer. (#4489)

ede3b67

Basically, scalars and implicitly unsqueezed.

Fixes #4475, Add debug flag for Windows (#4508)

83a9409

Add Slicing capabilities for Sequential, ModuleList and ParameterList (…

0e3692b

…#4491)

Fix two bugs in thnn_conv_depthwise2d_backward gradient.

92d9e5e

- Out of bounds grads[2] access (thnn_conv_depthwise2d_backward doesn't compute bias gradient) - Groups was not set appropriately for depthwise convolution Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Fix 'invalid argument 4: weight tensor has to be contiguous'

e844a63

Weight can be non-contiguous due to double backwards, where we transpose the weight. I'm not very happy with this fix but it seems to make the tests pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com>

s/uses_grad/uses_single_grad/ for more clarity.

c852ad4

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Check for out of bounds grads access in derivatives.yaml

e3654e7

This test would have caught the OOB in thnn_conv_depthwise2d_backward Fixes #4457 Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Actually test CUDA double-backwards codepath.

651d239

Previously, we only tested CPU double-backwards, which is bad! This would have caught #4422 (still not fixed, so those tests are manually disabled) and also uncovered #4500 (not yet diagnosed.) Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Delete redundant isContiguous check from THCUNN SpatialDilatedConvolu…

7f23c9a

…tion Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Fix multiprocessing and dataloader tests on Windows (#4453)

b9f0a28

Fix failing _test_RNN_cpu_vs_cudnn

b83dfec

elanmart closed this Jan 6, 2018

elanmart mentioned this pull request Jan 6, 2018

Implement backward pass for pack_padded_sequence #4512

Merged

ezyang added the open source label Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement backward for pack_padded_sequence #4341

Implement backward for pack_padded_sequence #4341

Uh oh!

elanmart commented Dec 24, 2017

Uh oh!

apaszke commented Dec 29, 2017

Uh oh!

apaszke commented Jan 1, 2018

Uh oh!

elanmart commented Jan 1, 2018

Uh oh!

apaszke commented Jan 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Implement backward for pack_padded_sequence #4341

Implement backward for pack_padded_sequence #4341

Uh oh!

Conversation

elanmart commented Dec 24, 2017

Uh oh!

apaszke commented Dec 29, 2017

Uh oh!

apaszke commented Jan 1, 2018

Uh oh!

elanmart commented Jan 1, 2018

Uh oh!

apaszke commented Jan 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants