Skip to content

Conversation

@SplitInfinity
Copy link

@SplitInfinity SplitInfinity commented Oct 8, 2020

Cherry-pick of #45559 into release/1.7
Summary
Fixes #45558

This assertion failure is caused by the incorrect implementation of aten::set_grad_enabled in torch/csrc/jit/runtime/register_special_ops.cpp. The current implementation is:

Operator(
    "aten::set_grad_enabled(bool val) -> ()",
    [](Stack* stack) {
      torch::GradMode::set_enabled(pop(stack).toBool());
      push(stack, IValue());
    },
    aliasAnalysisConservative()),

which push a None on to the evaluation stack after calling set_enabled. But according to the signature, the behavior is incorrect as the signature says this function won't return a value. I guess the original author might be confused by the behavior of Python, which pushes a None on to the evaluation stack when the function definition does not end with a return statement with an explicit result value.

If aten::set_grad_enabled pushes a None on to the evaluation stack, each time it's called, the evaluation stack will accumulate an extra None. In our case, with torch.no_grad(): will cause aten::set_grad_enabled to be called twice, so when the forward method finishes, the evaluation stack will be [None, None, Tensor]. But the return statement of GraphFunction::operator() in torch/csrc/jit/api/function_impl.cpp is return stack.front(); which will try to extract a tensor out of a None thus causes the assertion failure.

The solution is simple, just remove the push in the implementation of aten::set_grad_enabled.

Pull Request resolved: #45559

Reviewed By: albanD

Differential Revision: D24142153

Pulled By: SplitInfinity

fbshipit-source-id: 75aad0e38bd912a437f7e1a1ee89ab4445e35b5d

…45559)

Summary:
Fixes #45558

This assertion failure is caused by the incorrect implementation of ``aten::set_grad_enabled`` in [torch/csrc/jit/runtime/register_special_ops.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/register_special_ops.cpp#L436). The current implementation is:

```cpp
Operator(
    "aten::set_grad_enabled(bool val) -> ()",
    [](Stack* stack) {
      torch::GradMode::set_enabled(pop(stack).toBool());
      push(stack, IValue());
    },
    aliasAnalysisConservative()),
```

which push a ``None`` on to the evaluation stack after calling ``set_enabled``. But according to the signature, the behavior is incorrect as the signature says this function won't return a value. I guess the original author might be confused by the behavior of Python, which pushes a ``None`` on to the evaluation stack when the function definition does not end with a return statement with an explicit result value.

If ``aten::set_grad_enabled`` pushes a ``None`` on to the evaluation stack, each time it's called, the evaluation stack will accumulate an extra ``None``. In our case, ``with torch.no_grad():`` will cause ``aten::set_grad_enabled`` to be called twice, so when the ``forward`` method finishes, the evaluation stack will be ``[None, None, Tensor]``. But the return statement of ``GraphFunction::operator()`` in [torch/csrc/jit/api/function_impl.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/api/function_impl.cpp#L51) is ``return stack.front();`` which will try to extract a tensor out of a ``None`` thus causes the assertion failure.

The solution is simple, just remove the push in the implementation of ``aten::set_grad_enabled``.

Pull Request resolved: #45559

Reviewed By: albanD

Differential Revision: D24142153

Pulled By: SplitInfinity

fbshipit-source-id: 75aad0e38bd912a437f7e1a1ee89ab4445e35b5d
@SplitInfinity SplitInfinity requested a review from apaszke as a code owner October 8, 2020 22:23
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 8, 2020
@dr-ci
Copy link

dr-ci bot commented Oct 8, 2020

💊 CI failures summary and remediations

As of commit cad7759 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

Extra GitHub checks: 1 failed


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 2 times.

@codecov
Copy link

codecov bot commented Oct 9, 2020

Codecov Report

Merging #46060 into release/1.7 will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           release/1.7   #46060   +/-   ##
============================================
  Coverage        68.46%   68.46%           
============================================
  Files              406      406           
  Lines            52267    52267           
============================================
  Hits             35785    35785           
  Misses           16482    16482           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 333daf0...cad7759. Read the comment docs.

@malfet malfet changed the title aten::set_grad_enabled should not push as it does not return a value … [Release/1.7] aten::set_grad_enabled should not push as it does not return a value Oct 12, 2020
@malfet malfet merged commit ff9e565 into release/1.7 Oct 12, 2020
@facebook-github-bot facebook-github-bot deleted the no-grad-jit-cherry-pick branch January 27, 2021 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants