aten::set_grad_enabled should not push as it does not return a value #45559

xionghuaidong · 2020-09-30T05:17:30Z

This assertion failure is caused by the incorrect implementation of aten::set_grad_enabled in torch/csrc/jit/runtime/register_special_ops.cpp. The current implementation is:

Operator(
    "aten::set_grad_enabled(bool val) -> ()",
    [](Stack* stack) {
      torch::GradMode::set_enabled(pop(stack).toBool());
      push(stack, IValue());
    },
    aliasAnalysisConservative()),

which push a None on to the evaluation stack after calling set_enabled. But according to the signature, the behavior is incorrect as the signature says this function won't return a value. I guess the original author might be confused by the behavior of Python, which pushes a None on to the evaluation stack when the function definition does not end with a return statement with an explicit result value.

If aten::set_grad_enabled pushes a None on to the evaluation stack, each time it's called, the evaluation stack will accumulate an extra None. In our case, with torch.no_grad(): will cause aten::set_grad_enabled to be called twice, so when the forward method finishes, the evaluation stack will be [None, None, Tensor]. But the return statement of GraphFunction::operator() in torch/csrc/jit/api/function_impl.cpp is return stack.front(); which will try to extract a tensor out of a None thus causes the assertion failure.

The solution is simple, just remove the push in the implementation of aten::set_grad_enabled.

dr-ci · 2020-09-30T05:24:43Z

💊 CI failures summary and remediations

As of commit 4feb8fc (more details on the Dr. CI page):

1/3 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)
2/3 tentatively recognized as flaky ❄️
- Click here to rerun these jobs

❄️ 2 failures tentatively classified as flaky

but reruns have not yet been triggered to confirm:

pytorch_macos_10_13_py3_build (1/2)

Step: "Update Homebrew" (full log | diagnosis details | 🔁 rerun) ❄️

fatal: Could not read from remote repository.

Receiving objects:  98% (175/178) Receiving objects:  99% (177/178) Receiving objects: 100% (178/178) Receiving objects: 100% (178/178), 63.90 KiB | 10.65 MiB/s, done. 
Resolving deltas:  96% (89/92) Resolving deltas:  97% (90/92) Resolving deltas: 100% (92/92) Resolving deltas: 100% (92/92), completed with 85 local objects. 
From ssh://github.com/Homebrew/homebrew-cask-versions 
 + 15f6b44...90ed6b8 master     -> origin/master  (forced update) 
+ git reset --hard origin/master 
HEAD is now at 90ed6b8 Update microsoft-edge-beta from 86.0.622.19 to 86.0.622.28 (#9686) 
+ for path in '$(find /usr/local/Homebrew -type d -name .git)' 
+ cd /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/.git/.. 
+ git fetch --depth=1 origin 
Connection to github.com closed by remote host.  
fatal: Could not read from remote repository. 
 
Please make sure you have the correct access rights 
and the repository exists.

pytorch_ios_11_2_1_x86_64_build (2/2)

Step: "Update Homebrew" (full log | diagnosis details | 🔁 rerun) ❄️

fatal: Could not read from remote repository.

Receiving objects:  98% (175/178) Receiving objects:  99% (177/178) Receiving objects: 100% (178/178) Receiving objects: 100% (178/178), 63.90 KiB | 10.65 MiB/s, done. 
Resolving deltas:  96% (89/92) Resolving deltas:  97% (90/92) Resolving deltas: 100% (92/92) Resolving deltas: 100% (92/92), completed with 85 local objects. 
From ssh://github.com/Homebrew/homebrew-cask-versions 
 + 15f6b44...90ed6b8 master     -> origin/master  (forced update) 
+ git reset --hard origin/master 
HEAD is now at 90ed6b8 Update microsoft-edge-beta from 86.0.622.19 to 86.0.622.28 (#9686) 
+ for path in '$(find /usr/local/Homebrew -type d -name .git)' 
+ cd /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/.git/.. 
+ git fetch --depth=1 origin 
Connection to github.com closed by remote host.  
fatal: Could not read from remote repository. 
 
Please make sure you have the correct access rights 
and the repository exists.

Extra GitHub checks: 1 failed

Failed: GitHub Actions - clang-tidy

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 1 time.

SplitInfinity · 2020-10-06T18:04:49Z

torch/csrc/jit/runtime/register_special_ops.cpp

        "aten::set_grad_enabled(bool val) -> ()",
        [](Stack* stack) {
          torch::GradMode::set_enabled(pop(stack).toBool());
-          push(stack, IValue());


@eellison I remember we talked about this during the no_grad PR:

I agree with the PR author's assessment that what we thought is the return value of torch.set_grad_enabled(False) is actually just Python printing out none where there is no return value but I wanted to double-check.

Yea, my suggestion was you push None on the stack but we should have also updated the schema to reflect that

No return value is also fine. I want to double check that the operator doesn't have a return value though when you annotate it with (). (printing the IR graph should show us)

Confirmed

graph(): %2 : None = prim::Constant() # test/elias.py:5:0 %0 : bool = prim::Constant[value=1]() # test/elias.py:6:27 = aten::set_grad_enabled(%0) # test/elias.py:6:4 return (%2)

eellison

LGTM! Thx for PR! It would be nice if we could have caught this automatically - maybe a debug build that asserts the stack is the expected size after each operator. This isnt the first time we have an extra stack value....

facebook-github-bot

@SplitInfinity has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

xionghuaidong · 2020-10-06T23:13:52Z

LGTM! Thx for PR! It would be nice if we could have caught this automatically - maybe a debug build that asserts the stack is the expected size after each operator. This isnt the first time we have an extra stack value....

Checking stack size for each operator is a bit heavy, a debug build is appropriate. We can also add an always-on check in GraphFunction::operator() in torch/csrc/jit/api/function_impl.cpp which throws an exception if the stack size is not 1. This check could catch this bug.

…45559) Summary: Fixes #45558 This assertion failure is caused by the incorrect implementation of ``aten::set_grad_enabled`` in [torch/csrc/jit/runtime/register_special_ops.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/register_special_ops.cpp#L436). The current implementation is: ```cpp Operator( "aten::set_grad_enabled(bool val) -> ()", [](Stack* stack) { torch::GradMode::set_enabled(pop(stack).toBool()); push(stack, IValue()); }, aliasAnalysisConservative()), ``` which push a ``None`` on to the evaluation stack after calling ``set_enabled``. But according to the signature, the behavior is incorrect as the signature says this function won't return a value. I guess the original author might be confused by the behavior of Python, which pushes a ``None`` on to the evaluation stack when the function definition does not end with a return statement with an explicit result value. If ``aten::set_grad_enabled`` pushes a ``None`` on to the evaluation stack, each time it's called, the evaluation stack will accumulate an extra ``None``. In our case, ``with torch.no_grad():`` will cause ``aten::set_grad_enabled`` to be called twice, so when the ``forward`` method finishes, the evaluation stack will be ``[None, None, Tensor]``. But the return statement of ``GraphFunction::operator()`` in [torch/csrc/jit/api/function_impl.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/api/function_impl.cpp#L51) is ``return stack.front();`` which will try to extract a tensor out of a ``None`` thus causes the assertion failure. The solution is simple, just remove the push in the implementation of ``aten::set_grad_enabled``. Pull Request resolved: #45559 Reviewed By: albanD Differential Revision: D24142153 Pulled By: SplitInfinity fbshipit-source-id: 75aad0e38bd912a437f7e1a1ee89ab4445e35b5d

facebook-github-bot · 2020-10-08T22:18:25Z

@SplitInfinity merged this pull request in e3112e3.

…45559) (#46060) Summary: Fixes #45558 This assertion failure is caused by the incorrect implementation of ``aten::set_grad_enabled`` in [torch/csrc/jit/runtime/register_special_ops.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/register_special_ops.cpp#L436). The current implementation is: ```cpp Operator( "aten::set_grad_enabled(bool val) -> ()", [](Stack* stack) { torch::GradMode::set_enabled(pop(stack).toBool()); push(stack, IValue()); }, aliasAnalysisConservative()), ``` which push a ``None`` on to the evaluation stack after calling ``set_enabled``. But according to the signature, the behavior is incorrect as the signature says this function won't return a value. I guess the original author might be confused by the behavior of Python, which pushes a ``None`` on to the evaluation stack when the function definition does not end with a return statement with an explicit result value. If ``aten::set_grad_enabled`` pushes a ``None`` on to the evaluation stack, each time it's called, the evaluation stack will accumulate an extra ``None``. In our case, ``with torch.no_grad():`` will cause ``aten::set_grad_enabled`` to be called twice, so when the ``forward`` method finishes, the evaluation stack will be ``[None, None, Tensor]``. But the return statement of ``GraphFunction::operator()`` in [torch/csrc/jit/api/function_impl.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/api/function_impl.cpp#L51) is ``return stack.front();`` which will try to extract a tensor out of a ``None`` thus causes the assertion failure. The solution is simple, just remove the push in the implementation of ``aten::set_grad_enabled``. Pull Request resolved: #45559 Reviewed By: albanD Differential Revision: D24142153 Pulled By: SplitInfinity fbshipit-source-id: 75aad0e38bd912a437f7e1a1ee89ab4445e35b5d Co-authored-by: huaidong.xiong <huaidong.xiong@mobvista.com>

aten::set_grad_enabled should not push as it does not return a value

4feb8fc

xionghuaidong requested a review from apaszke as a code owner September 30, 2020 05:17

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Sep 30, 2020

pytorchbot added the open source label Sep 30, 2020

mrshenli added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 2, 2020

mrshenli requested a review from suo October 2, 2020 00:07

SplitInfinity requested review from SplitInfinity and removed request for suo October 6, 2020 17:20

SplitInfinity reviewed Oct 6, 2020

View reviewed changes

eellison approved these changes Oct 6, 2020

View reviewed changes

facebook-github-bot reviewed Oct 6, 2020

View reviewed changes

facebook-github-bot closed this in e3112e3 Oct 8, 2020

facebook-github-bot added the Merged label Oct 8, 2020

This was referenced Oct 8, 2020

[Release/1.7] aten::set_grad_enabled should not push as it does not return a value #46060

Merged

[v1.7.0] Release Tracker #45592

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

aten::set_grad_enabled should not push as it does not return a value #45559

aten::set_grad_enabled should not push as it does not return a value #45559

Uh oh!

xionghuaidong commented Sep 30, 2020

Uh oh!

dr-ci bot commented Sep 30, 2020 •

edited

Loading

Uh oh!

SplitInfinity Oct 6, 2020

Uh oh!

eellison Oct 6, 2020 •

edited

Loading

Uh oh!

eellison Oct 6, 2020

Uh oh!

eellison Oct 6, 2020

Uh oh!

eellison left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

xionghuaidong commented Oct 6, 2020

Uh oh!

facebook-github-bot commented Oct 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

aten::set_grad_enabled should not push as it does not return a value #45559

aten::set_grad_enabled should not push as it does not return a value #45559

Uh oh!

Conversation

xionghuaidong commented Sep 30, 2020

Uh oh!

dr-ci bot commented Sep 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

❄️ 2 failures tentatively classified as flaky

pytorch_macos_10_13_py3_build (1/2)

pytorch_ios_11_2_1_x86_64_build (2/2)

Extra GitHub checks: 1 failed

Uh oh!

SplitInfinity Oct 6, 2020

Choose a reason for hiding this comment

Uh oh!

eellison Oct 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eellison Oct 6, 2020

Choose a reason for hiding this comment

Uh oh!

eellison Oct 6, 2020

Choose a reason for hiding this comment

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

xionghuaidong commented Oct 6, 2020

Uh oh!

facebook-github-bot commented Oct 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dr-ci bot commented Sep 30, 2020 •

edited

Loading

eellison Oct 6, 2020 •

edited

Loading