Skip to content

Conversation

@elanmart
Copy link
Contributor

On my machine this PR gives the following on CPU:

%timeit _ = rnn_utils.pack_padded_sequence(x, lengths).data.sum().backward()
%timeit _ = old_utils.pack_padded_sequence(x, lengths).data.sum().backward()

# 100 loops, best of 3: 5.75 ms per loop  # this PR
# 1 loop, best of 3: 194 ms per loop      # master

and GPU

%timeit _ = rnn_utils.pack_padded_sequence(x_cuda, lengths).data.sum().backward()
%timeit _ = old_utils.pack_padded_sequence(x_cuda, lengths).data.sum().backward()

# 1000 loops, best of 3: 1.67 ms per loop  # this PR
# 100 loops, best of 3: 11.5 ms per loop   # master

I've implemented this in python since I couldn't get ATen to accept the batch_sizes as input to backward.

There's an ugly conversion from a Variable to list when creating the PackedSequence tuple.

And also two tests are failing, because they assume that when pack_padded_sequence is called with Tensors, the PackedSequence.data will also be a Tensor, but here a Function converts it to Variable. Not sure how this should be handled.

Please let me know if this PR makes sense and if there's anything I could fix. Oh, and also if I should somehow concatenate these commits into a single one.

@elanmart elanmart changed the title Pack padded function Implement backward for pack_padded_sequence Dec 24, 2017
@apaszke
Copy link
Contributor

apaszke commented Dec 29, 2017

@pytorchbot test this please

@apaszke apaszke closed this Dec 30, 2017
@apaszke apaszke reopened this Dec 30, 2017
@apaszke apaszke closed this Jan 1, 2018
@apaszke apaszke reopened this Jan 1, 2018
@apaszke
Copy link
Contributor

apaszke commented Jan 1, 2018

I think the CI is failing because it used to work with Tensors, which is no longer the case. I don't think it's worth maintaining that support since it's not documented, and tensors are going to be merged with Variables before 0.4, so it's not a big deal. Can you please fix the tests?

@elanmart
Copy link
Contributor Author

elanmart commented Jan 1, 2018

Hey Adam, I'm so sorry for the lack of response, I was on a short vacations without a proper internet access.
I've mentioned the issue with tests in the PR description

And also two tests are failing, because they assume that when pack_padded_sequence is called with Tensors, the PackedSequence.data will also be a Tensor, but here a Function converts it to Variable. Not sure how this should be handled.

If it's OK, I'll change these tests tomorrow.

@apaszke
Copy link
Contributor

apaszke commented Jan 1, 2018

Sorry I forgot that you've already mentioned this! There's no hurry, take your time

yongjik and others added 14 commits January 6, 2018 23:00
* allow_inf on test_beta_log_prob
* Support allow_inf on assertAlmostEqual

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
- as_variable no longer needs to be an instance function
 - mark functions as static
* Improve matmul native test tolerance.

Because we don't directly use bmm in one case of matmul, a comparison to bmm doesn't make sense;
instead, we compare to the double result.

* Fix spelling.
Adds a missing bias term to the __repr__ functions of the
Linear and Bilinear modules. Fixes the spacing in the Conv2d
__repr__ to make it consistent with other modules.
* Support ATen GPU pointwise apply and torch.where.

Like the CPU version, this implements an apply template that is almost identical to the
apply template already in THC, but using the ATen API.  Much of this involves stripping out
the TensorUtils code (which is basically templated ATen-style), although a couple of functions
remain that are apply specific (and thus don't seem worth porting to ATen), namely
overlappingIndices, canUse32BitIndexMath, and getTensorInfo.  We can make those generally
available if there's a need.

* Use int64_t instead of ptrdiff_t.

* Use snake case for _copyIgnoringOverlaps_.
Currently 1-layer RNN is supported
ezyang and others added 27 commits January 6, 2018 23:00
* Delete obsolete basic ops.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* More deletion.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete some unused utilities.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete dead apply_fn

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete CppFunction symbolic support.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete ForwardFunction

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Batchnorm is 'working'

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* add gumbel_softmax, based on Eric Jang's implementation

* Make gumbel_softmax CUDA friendly

* gumbel_softmax tweaks
* Deprecate nn.NLLLoss2d

* Fix legacy tests

* Fix tests

* Remove NLLLoss2d from docs, add deprecation warning instead of error

* fix lint

* Add more to docs
* Add test for empty Variable cat (forward only).

* Test for empty cat (no grad/gradgrad checks)

* Support gradcheck on empty inputs, check it for cat with an empty Variable.

* Fix lint.
….cwrap. (#4479)

The specification and logic aren't necessary anymore, it's fine to specify the default as nullptr.
BCELoss's outputs and gradInput computations are accurate to around 1e-6 on float types (as a relative value, not absolute), which is reasonable. However, the tests use absolute thresholds: the accumulation of 5 gradInputs has to have error less than 0.0002.

The worse case for BCELoss's gradInput for each element may be described as 1 / ( (1-x) * x ). Previously, the input to the test was restricted to [0.02, 1- 0.02], resulting in worse-case largest gradInput of 50, resulting in a total accumulated grad of 50*5 = 250, resulting in an error of 250 * 1e-6 = 0.00025, which was too big.

By restricting x to [0.028, 1- 0.028] we get a worse case of 36.74, resulting in a total accumulated grad of 184, which is less than the 200 needed to have error less than 0.0002.
This mismatched paren causes a syntax error in generated code. I'm guessing the parentheses are necessary, since there was one in there before, but I don't actually know whether the compiler can produce things like a - (b - c) that would make them required.
* Supporting logits as parameters in Bernoulli and Categorical

* address comments

* fix lint

* modify binary_cross_entropy_with_logits

* address comments

* add descriptor for lazy attributes

* address comments
Basically, scalars and implicitly unsqueezed.
- Out of bounds grads[2] access (thnn_conv_depthwise2d_backward
  doesn't compute bias gradient)

- Groups was not set appropriately for depthwise convolution

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Weight can be non-contiguous due to double backwards, where
we transpose the weight.  I'm not very happy with this fix
but it seems to make the tests pass.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
This test would have caught the OOB in thnn_conv_depthwise2d_backward

Fixes #4457

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Previously, we only tested CPU double-backwards, which is bad!
This would have caught #4422 (still not fixed, so those tests
are manually disabled) and also uncovered #4500 (not yet diagnosed.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
…tion

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.