-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Implement backward for pack_padded_sequence #4341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@pytorchbot test this please |
|
I think the CI is failing because it used to work with Tensors, which is no longer the case. I don't think it's worth maintaining that support since it's not documented, and tensors are going to be merged with Variables before 0.4, so it's not a big deal. Can you please fix the tests? |
|
Hey Adam, I'm so sorry for the lack of response, I was on a short vacations without a proper internet access.
If it's OK, I'll change these tests tomorrow. |
|
Sorry I forgot that you've already mentioned this! There's no hurry, take your time |
* allow_inf on test_beta_log_prob * Support allow_inf on assertAlmostEqual Signed-off-by: Edward Z. Yang <ezyang@fb.com>
- as_variable no longer needs to be an instance function - mark functions as static
* Improve matmul native test tolerance. Because we don't directly use bmm in one case of matmul, a comparison to bmm doesn't make sense; instead, we compare to the double result. * Fix spelling.
Adds a missing bias term to the __repr__ functions of the Linear and Bilinear modules. Fixes the spacing in the Conv2d __repr__ to make it consistent with other modules.
* Support ATen GPU pointwise apply and torch.where. Like the CPU version, this implements an apply template that is almost identical to the apply template already in THC, but using the ATen API. Much of this involves stripping out the TensorUtils code (which is basically templated ATen-style), although a couple of functions remain that are apply specific (and thus don't seem worth porting to ATen), namely overlappingIndices, canUse32BitIndexMath, and getTensorInfo. We can make those generally available if there's a need. * Use int64_t instead of ptrdiff_t. * Use snake case for _copyIgnoringOverlaps_.
Currently 1-layer RNN is supported
* Delete obsolete basic ops. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * More deletion. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete some unused utilities. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete dead apply_fn Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete CppFunction symbolic support. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete ForwardFunction Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Batchnorm is 'working' Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* add gumbel_softmax, based on Eric Jang's implementation * Make gumbel_softmax CUDA friendly * gumbel_softmax tweaks
* Deprecate nn.NLLLoss2d * Fix legacy tests * Fix tests * Remove NLLLoss2d from docs, add deprecation warning instead of error * fix lint * Add more to docs
* Add test for empty Variable cat (forward only). * Test for empty cat (no grad/gradgrad checks) * Support gradcheck on empty inputs, check it for cat with an empty Variable. * Fix lint.
….cwrap. (#4479) The specification and logic aren't necessary anymore, it's fine to specify the default as nullptr.
BCELoss's outputs and gradInput computations are accurate to around 1e-6 on float types (as a relative value, not absolute), which is reasonable. However, the tests use absolute thresholds: the accumulation of 5 gradInputs has to have error less than 0.0002. The worse case for BCELoss's gradInput for each element may be described as 1 / ( (1-x) * x ). Previously, the input to the test was restricted to [0.02, 1- 0.02], resulting in worse-case largest gradInput of 50, resulting in a total accumulated grad of 50*5 = 250, resulting in an error of 250 * 1e-6 = 0.00025, which was too big. By restricting x to [0.028, 1- 0.028] we get a worse case of 36.74, resulting in a total accumulated grad of 184, which is less than the 200 needed to have error less than 0.0002.
This mismatched paren causes a syntax error in generated code. I'm guessing the parentheses are necessary, since there was one in there before, but I don't actually know whether the compiler can produce things like a - (b - c) that would make them required.
* Supporting logits as parameters in Bernoulli and Categorical * address comments * fix lint * modify binary_cross_entropy_with_logits * address comments * add descriptor for lazy attributes * address comments
Basically, scalars and implicitly unsqueezed.
- Out of bounds grads[2] access (thnn_conv_depthwise2d_backward doesn't compute bias gradient) - Groups was not set appropriately for depthwise convolution Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Weight can be non-contiguous due to double backwards, where we transpose the weight. I'm not very happy with this fix but it seems to make the tests pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
This test would have caught the OOB in thnn_conv_depthwise2d_backward Fixes #4457 Signed-off-by: Edward Z. Yang <ezyang@fb.com>
…tion Signed-off-by: Edward Z. Yang <ezyang@fb.com>
On my machine this PR gives the following on CPU:
and GPU
I've implemented this in python since I couldn't get
ATento accept thebatch_sizesas input tobackward.There's an ugly conversion from a
Variabletolistwhen creating thePackedSequencetuple.And also two tests are failing, because they assume that when
pack_padded_sequenceis called withTensors, thePackedSequence.datawill also be aTensor, but here aFunctionconverts it toVariable. Not sure how this should be handled.Please let me know if this PR makes sense and if there's anything I could fix. Oh, and also if I should somehow concatenate these commits into a single one.