Skip to content

Conversation

@ssnl
Copy link
Collaborator

@ssnl ssnl commented May 20, 2018

Change backward calls to grad to avoid memory leak from #7343
Replace unnecessary create_graph=True with retain_graph=True

The memory leak is blocking #7270 .

… Replace unnecesary create_graph=True with retain_graph=True
@ssnl
Copy link
Collaborator Author

ssnl commented May 20, 2018

The current gradgradcheck is broken because the inputs, containing grad_outputs, is not actually used in the function. Then when it is non-contiguous and gradcheck makes it contiguous, things break.

>>> # print numerical and analytical grad wrt the grad_output
>>> torch.autograd.gradgradcheck(lambda x: x, [torch.randn(3).requires_grad_()], gen_non_contig_grad_outputs=False)
(tensor([[ 1.0133,  0.0000,  0.0000],
        [ 0.0000,  1.0133,  0.0000],
        [ 0.0000,  0.0000,  1.0133]]), tensor([[ 1.,  0.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  1.]]))

>>> torch.autograd.gradgradcheck(lambda x: x, [torch.randn(3).requires_grad_()], gen_non_contig_grad_outputs=True)
(tensor([[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]]), tensor([[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]]))

Also fixing this.

input = contiguous(input)
target = contiguous(target)
if not all(t.is_contiguous() for t in iter_tensors(target)):
raise RuntimeError("target must only contain contiguous Tensors")

This comment was marked as off-topic.

This comment was marked as off-topic.

input_args = tuple(x for x in input_args if isinstance(x, torch.Tensor) and x.requires_grad)
grad_inputs = torch.autograd.grad(outputs, input_args, grad_outputs, create_graph=True)
grad_inputs = torch.autograd.grad(outputs, input_args, grad_outputs,
create_graph=True, only_inputs=True, allow_unused=True)

This comment was marked as off-topic.

'although analytical gradient matches numerical gradient')

# check if the backward multiplies by grad_output
zero_gradients(inputs)

This comment was marked as off-topic.

@ssnl
Copy link
Collaborator Author

ssnl commented May 21, 2018

I just got rid of the contiguity enforcement altogether. Hope that everything still works.

ggO,
ggW_maybe_squeeze.expand_as(gO) * gO * nonpositive_mask,
(ggI * gO * nonpositive_mask).sum()
(ggI * gO * nonpositive_mask).sum().expand_as(weight)

This comment was marked as off-topic.

This comment was marked as off-topic.

@ssnl
Copy link
Collaborator Author

ssnl commented May 21, 2018

@apaszke This is ready for review.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One concern about gradgradcheck that would be good to clarify before merging.

ggO,
ggW_maybe_squeeze.expand_as(gO) * gO * nonpositive_mask,
(ggI * gO * nonpositive_mask).sum()
(ggI * gO * nonpositive_mask).sum().expand_as(weight)

This comment was marked as off-topic.

# need data here to get around the version check because without .data,
# the following code updates version but doesn't change content
x_tensor = x_tensor.data
for d_idx, x_idx in enumerate(product(*[range(m) for m in x_tensor.size()])):

This comment was marked as off-topic.

var.requires_grad = True
return var
y = torch.testing.make_non_contiguous(y).requires_grad_()
return y

This comment was marked as off-topic.


def new_func(*args):
input_args = args[:-num_outputs]
grad_outputs = args[-num_outputs:]

This comment was marked as off-topic.

This comment was marked as off-topic.

@ssnl ssnl merged commit e3e15b5 into pytorch:master May 23, 2018
@ssnl ssnl deleted the improve_gradcheck branch May 23, 2018 15:03
petrex pushed a commit to petrex/pytorch that referenced this pull request May 23, 2018
…e2_core_hip

* 'caffe2_core_hip' of github.com:petrex/pytorch: (24 commits)
  Allow empty storage for the 'Edge' class. (pytorch#7595)
  Process group base class and Gloo implementation (pytorch#7628)
  _LRSchedulers getstate include optimizer info (pytorch#7757)
  [PyTorch] [gradcheck] change backward() to grad() (pytorch#7710)
  Update test_nn.py (pytorch#7787)
  Define general default scheduler for TBB and fix ppc64le bug (pytorch#7761)
  Add support for accepting Tensor as input in clip_grad_*  functions. (pytorch#7769)
  [Easy] Remove unused code (pytorch#7782)
  Update tbb (pytorch#7734)
  Add @generated annotation (pytorch#7780)
  fix legacy comment after variable tensor merge (pytorch#7771)
  Revert pytorch#7750 and pytorch#7762 to fix Windows CI on master (pytorch#7772)
  Temporarily disable build env check (pytorch#7768)
  Add missing brace (pytorch#7762)
  [C++ API] Add backward() to Tensor and Variable  (pytorch#7750)
  [auto] Update onnx to d43b550 - Fix .gitignore and add missing files (onnx/onnx#1005) onnx/onnx@d43b550
  [auto] Update onnx to ea1aa13 - add tests for reduce ops (onnx/onnx#675) onnx/onnx@ea1aa13
  include cudnn_h (pytorch#7749)
  [C++ API] Using new registration mechanism (pytorch#7663)
  [auto] Update onnx to 5dd68e6 - Add a util function: polish_model (onnx/onnx#1000) onnx/onnx@5dd68e6
  ...
weiyangfb pushed a commit to weiyangfb/pytorch that referenced this pull request Jun 11, 2018
* Change backward calls to grad to avoid memory leak from pytorch#7343; Replace unnecesary create_graph=True with retain_graph=True

* fix gradgradcheck use of make_non_contiguous

* allow non-contguous target

* remove unnecessray .grad.zero_()

* remove contiguous_detach

* fix PReLU double backward always returning ggW as a scalar

* let noncontig gO require grad

* move requires_grad to return
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants