Fix CUDA double backwards #4460

ezyang · 2018-01-03T17:35:09Z

CUDA double backwards was broken, and we didn't know about it. Fix it.

Each commit is a logical unit of work.

commit 9ae56c3bb4a26e01269031d4205263952ceadc79
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Wed Jan 3 09:27:24 2018 -0800

    Actually test CUDA double-backwards codepath.
    
    Previously, we only tested CPU double-backwards, which is bad!
    This would have caught #4422 (still not fixed, so those tests
    are manually disabled.)
    
    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

commit e80a9949c19ba979b63e8faeb984b34c6929d4c2
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Wed Jan 3 08:14:51 2018 -0800

    Check for out of bounds grads access in derivatives.yaml
    
    This test would have caught the OOB in thnn_conv_depthwise2d_backward
    
    Fixes #4457
    
    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

commit c8b34c46b508c3636d60a9bd297e755baeff7b20
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Wed Jan 3 08:14:27 2018 -0800

    s/uses_grad/uses_single_grad/ for more clarity.
    
    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

commit 563936c3b81b4c0a37e4348d36af2265f98feabf
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Wed Jan 3 09:18:18 2018 -0800

    Fix 'invalid argument 4: weight tensor has to be contiguous'
    
    Weight can be non-contiguous due to double backwards, where
    we transpose the weight.  I'm not very happy with this fix
    but it seems to make the tests pass.
    
    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

commit af1283c24c9d39465342654cd4b9b2eca6d1d9c6
Author: Edward Z. Yang <ezyang@fb.com>
Date:   Wed Jan 3 08:39:04 2018 -0800

    Fix two bugs in thnn_conv_depthwise2d_backward gradient.
    
    - Out of bounds grads[2] access (thnn_conv_depthwise2d_backward
      doesn't compute bias gradient)
    
    - Groups was not set appropriately for depthwise convolution
    
    Signed-off-by: Edward Z. Yang <ezyang@fb.com>

pytorchbot · 2018-01-03T17:35:09Z

@ezyang, thanks for your PR! We identified @zdevito to be a potential reviewer.

test/common_nn.py

tools/autograd/load_derivatives.py

test/common_nn.py

aten/src/ATen/native/Convolution.cpp

test/common_nn.py

- Out of bounds grads[2] access (thnn_conv_depthwise2d_backward doesn't compute bias gradient) - Groups was not set appropriately for depthwise convolution Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Weight can be non-contiguous due to double backwards, where we transpose the weight. I'm not very happy with this fix but it seems to make the tests pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

This test would have caught the OOB in thnn_conv_depthwise2d_backward Fixes pytorch#4457 Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Previously, we only tested CPU double-backwards, which is bad! This would have caught pytorch#4422 (still not fixed, so those tests are manually disabled) and also uncovered pytorch#4500 (not yet diagnosed.) Signed-off-by: Edward Z. Yang <ezyang@fb.com>

…tion Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ezyang requested a review from gchanan January 3, 2018 17:35

onnxbot-worker-3 mentioned this pull request Jan 3, 2018

[auto] pytorch-pr-4460 onnxbot/onnx-fb-universe#164

Closed

colesbury reviewed Jan 3, 2018

View reviewed changes

gchanan reviewed Jan 3, 2018

View reviewed changes

apaszke reviewed Jan 4, 2018

View reviewed changes

ezyang added 3 commits January 5, 2018 13:51

Fix two bugs in thnn_conv_depthwise2d_backward gradient.

a1c7f5a

- Out of bounds grads[2] access (thnn_conv_depthwise2d_backward doesn't compute bias gradient) - Groups was not set appropriately for depthwise convolution Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Fix 'invalid argument 4: weight tensor has to be contiguous'

2bd8322

Weight can be non-contiguous due to double backwards, where we transpose the weight. I'm not very happy with this fix but it seems to make the tests pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com>

s/uses_grad/uses_single_grad/ for more clarity.

30da5f1

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ezyang force-pushed the pr/cuda-double-backwards branch from 3514b49 to 1e1858c Compare January 5, 2018 21:53

ezyang added 3 commits January 5, 2018 14:06

Check for out of bounds grads access in derivatives.yaml

1551a19

This test would have caught the OOB in thnn_conv_depthwise2d_backward Fixes pytorch#4457 Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Delete redundant isContiguous check from THCUNN SpatialDilatedConvolu…

68f7227

…tion Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ezyang force-pushed the pr/cuda-double-backwards branch from 1e1858c to 68f7227 Compare January 5, 2018 22:06

ezyang merged commit 21d48be into pytorch:master Jan 6, 2018

Fix CUDA double backwards #4460

Fix CUDA double backwards #4460

Uh oh!

Conversation

ezyang commented Jan 3, 2018

Uh oh!

pytorchbot commented Jan 3, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants