Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Jan 3, 2018

CUDA double backwards was broken, and we didn't know about it. Fix it.

Each commit is a logical unit of work.

commit 9ae56c3bb4a26e01269031d4205263952ceadc79
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Wed Jan 3 09:27:24 2018 -0800

    Actually test CUDA double-backwards codepath.
    
    Previously, we only tested CPU double-backwards, which is bad!
    This would have caught #4422 (still not fixed, so those tests
    are manually disabled.)
    
    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

commit e80a9949c19ba979b63e8faeb984b34c6929d4c2
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Wed Jan 3 08:14:51 2018 -0800

    Check for out of bounds grads access in derivatives.yaml
    
    This test would have caught the OOB in thnn_conv_depthwise2d_backward
    
    Fixes #4457
    
    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

commit c8b34c46b508c3636d60a9bd297e755baeff7b20
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Wed Jan 3 08:14:27 2018 -0800

    s/uses_grad/uses_single_grad/ for more clarity.
    
    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

commit 563936c3b81b4c0a37e4348d36af2265f98feabf
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Wed Jan 3 09:18:18 2018 -0800

    Fix 'invalid argument 4: weight tensor has to be contiguous'
    
    Weight can be non-contiguous due to double backwards, where
    we transpose the weight.  I'm not very happy with this fix
    but it seems to make the tests pass.
    
    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

commit af1283c24c9d39465342654cd4b9b2eca6d1d9c6
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Wed Jan 3 08:39:04 2018 -0800

    Fix two bugs in thnn_conv_depthwise2d_backward gradient.
    
    - Out of bounds grads[2] access (thnn_conv_depthwise2d_backward
      doesn't compute bias gradient)
    
    - Groups was not set appropriately for depthwise convolution
    
    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

@pytorchbot
Copy link
Collaborator

@ezyang, thanks for your PR! We identified @zdevito to be a potential reviewer.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

ezyang added 3 commits January 5, 2018 13:51
- Out of bounds grads[2] access (thnn_conv_depthwise2d_backward
  doesn't compute bias gradient)

- Groups was not set appropriately for depthwise convolution

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Weight can be non-contiguous due to double backwards, where
we transpose the weight.  I'm not very happy with this fix
but it seems to make the tests pass.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
@ezyang ezyang force-pushed the pr/cuda-double-backwards branch from 3514b49 to 1e1858c Compare January 5, 2018 21:53
ezyang added 3 commits January 5, 2018 14:06
This test would have caught the OOB in thnn_conv_depthwise2d_backward

Fixes pytorch#4457

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Previously, we only tested CPU double-backwards, which is bad!
This would have caught pytorch#4422 (still not fixed, so those tests
are manually disabled) and also uncovered pytorch#4500 (not yet diagnosed.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
…tion

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
@ezyang ezyang force-pushed the pr/cuda-double-backwards branch from 1e1858c to 68f7227 Compare January 5, 2018 22:06
@ezyang ezyang merged commit 21d48be into pytorch:master Jan 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants