[pytorch] bump up variable version regardless of differentiability #41269

ljk53 · 2020-07-10T19:48:19Z

Stack from ghstack:

[pytorch] bump up variable version regardless of differentiability #41269 [pytorch] bump up variable version regardless of differentiability

The ultimate goal is to move things that are not gated with if (compute_requires_grad(...))
or if (grad_fn) out from VariableType so that VariableType kernels can be enabled/disabled
based upon GradMode. Then we can merge AutoNonVariableTypeMode and NoGradGuard.

We've moved profiling / tracing logic out from VariableType. One remaining thing that's
not gated with the if-statement is the increment_version call.

However, the gen_variable_type.py does use bits from derivatives.yaml to determine whether
to emit the increment_version call. If an output is never going to be differentiable (not based
upon runtime property of the variable but based upon static property, e.g. it's integral type)
then it would never emit the increment_version call.

Hypothetically, increment_version for a tensor can be orthogonal to its differentiability.

This PR is to make the change and test its impact. Making this logical simplification would
allow us to move this out from VariableType to aten codegen.

Differential Revision: D22471643

The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) [ghstack-poisoned]

The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) ghstack-source-id: 107497069 Pull Request resolved: #41269

dr-ci · 2020-07-10T19:54:23Z

💊 CI failures summary and remediations

As of commit 3f32b4e (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 1/2 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Jul 23 05:33:53 FAILED: caffe2/torch/CMakeFiles/torch_python.dir/__/test/cpp/jit/test_misc.cpp.o

Jul 23 05:33:50 compilation terminated. 
Jul 23 05:33:50 [4308/4348] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/__/test/cpp/jit/test_module_api.cpp.o 
Jul 23 05:33:50 FAILED: caffe2/torch/CMakeFiles/torch_python.dir/__/test/cpp/jit/test_module_api.cpp.o  
SE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -fno-strict-aliasing -Wno-write-strings -Wno-strict-aliasing -pthread -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-unknown-pragmas -std=gnu++14 -MD -MT caffe2/torch/CMakeFiles/torch_python.dir/__/test/cpp/jit/test_module_api.cpp.o -MF caffe2/torch/CMakeFiles/torch_python.dir/__/test/cpp/jit/test_module_api.cpp.o.d -o caffe2/torch/CMakeFiles/torch_python.dir/__/test/cpp/jit/test_module_api.cpp.o -c ../test/cpp/jit/test_module_api.cpp 
Jul 23 05:33:50 ../test/cpp/jit/test_module_api.cpp:214:1: fatal error: error writing to /tmp/cciBMy7O.s: No space left on device 
Jul 23 05:33:50  } // namespace torch 
Jul 23 05:33:50  ^ 
Jul 23 05:33:50 compilation terminated. 
Jul 23 05:33:51 [4309/4348] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/__/test/cpp/jit/test_save_load.cpp.o 
Jul 23 05:33:53 [4310/4348] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/__/test/cpp/jit/test_misc.cpp.o 
Jul 23 05:33:53 FAILED: caffe2/torch/CMakeFiles/torch_python.dir/__/test/cpp/jit/test_misc.cpp.o  
-DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -fno-strict-aliasing -Wno-write-strings -Wno-strict-aliasing -pthread -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-unknown-pragmas -std=gnu++14 -MD -MT caffe2/torch/CMakeFiles/torch_python.dir/__/test/cpp/jit/test_misc.cpp.o -MF caffe2/torch/CMakeFiles/torch_python.dir/__/test/cpp/jit/test_misc.cpp.o.d -o caffe2/torch/CMakeFiles/torch_python.dir/__/test/cpp/jit/test_misc.cpp.o -c ../test/cpp/jit/test_misc.cpp 
Jul 23 05:33:53 ../test/cpp/jit/test_misc.cpp:2090:1: fatal error: error writing to /tmp/ccNYke2K.s: No space left on device 
Jul 23 05:33:53  } // namespace torch 
Jul 23 05:33:53  ^ 
Jul 23 05:33:53 compilation terminated. 
Jul 23 05:33:54 [4311/4348] Building CXX object caffe2/torch/CMakeFiles/torch_python.dir/csrc/distributed/autograd/init.cpp.o 
Jul 23 05:33:54 ninja: build stopped: subcommand failed. 
Jul 23 05:33:54 Traceback (most recent call last): 
Jul 23 05:33:54   File "setup.py", line 734, in <module> 
Jul 23 05:33:54     build_deps()

ci.pytorch.org: 1 failed

Failed: pr/caffe2-pytorch-linux-xenial-rocm3.5.1-py3.6-test

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 35 times.

The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) [ghstack-poisoned]

Pull Request resolved: #41269 The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. ghstack-source-id: 107549144 Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/)

The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) [ghstack-poisoned]

Pull Request resolved: #41269 The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. ghstack-source-id: 107710481 Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/)

ezyang · 2020-07-14T20:08:29Z

tools/autograd/gen_variable_type.py


    body.append(emit_call(env, tie_return_values))
+    if strategy == 'use_derived':
+        body.extend(emit_increment_version())


So the idea here is, if the op in question is composite, we DON'T emit the increment version (so that the constituent pieces can take care of it)

albanD · 2020-07-14T20:40:47Z

tools/autograd/gen_variable_type.py

        if not modifies_arguments:
            return []
-        return ['increment_version({});'.format(arg['name']) for arg in differentiable_outputs]
+        return ['increment_version({});'.format(arg['name']) for arg in returns]


Why is this changed? does it actually matter?
Do we have inplace ops that return multiple things? And if so do we have some that return a mix of differentiable/non-differentiable outputs?

We should increment version on all tensors that get mutated, not just the differentiable ones. You can save non differentiable tensors as part of backwards formula...

But I don't expect that these other outputs are actually modified inplace! We would be bumping the version of a Tensor we don't change inplace.

Also this is BC-breaking right?

Why is this changed? does it actually matter?
Do we have inplace ops that return multiple things? And if so do we have some that return a mix of differentiable/non-differentiable outputs?

There are cases where it returns multiple things mixed of differentiable/non-differentiable outputs, e.g.:
ljk53@9ab9e4b#diff-79b1a31c97eee8dda9e0dae02162beecR2986

non-differentiable outputs are usually things like indices.

Ho right these out=(val, ind) functions...
Ok!

But I don't expect that these other outputs are actually modified inplace! We would be bumping the version of a Tensor we don't change inplace.

Also this is BC-breaking right?

This is a good point - just to clarify: regarding BC breakage, do you mean:

code that uses x._version to read variable version explicitly, or

code broken by falsely incrementing version for tensors that are not actually updated?

I call out "falsely" bumping because it's harmful as the breakage is totally unnecessary compared to truly bumping - in which case it is technically more correct behavior - if truly bumping breaks any code then it probably reveals bugs.

Seems we made effort to differentiate Tensor& and const Tensor& and seems returns at this place only contains Tensor& ones - so I assumed that these params are possibly mutated and bumped their versions. By eyeballing check seems in most cases the returns are indeed mutated, but I haven't verified all cases - do you know whether is it right assumption?

code that uses x._version to read variable version explicitly

This one is ok to break as this is an internal API. You might need to update a couple tests but it should be fine overall (even though we want to avoid breaking it often).

code broken by falsely incrementing version for tensors that are not actually updated?

There are two cases here indeed:

Cases where we were not incrementing while we should:

import torch from torch.utils import checkpoint a = torch.ones(10, 10, requires_grad=True) b, ind = a.max(dim=0) with torch.no_grad(): if False: ind += 1 # Raise an error as expected elif True: # No error and wrong grad t = torch.zeros(10) t[2] = 1 torch.cummax(t, dim=0, out=(torch.Tensor(), ind)) else: pass b.sum().backward() print(a.grad)

Cases where we should not increment as the second result is not modified inplace. Not sure if this happens in practice. In any case, we should be able to tell from the signature if it is modified or not (the signature is (should) always right!).

albanD · 2020-07-14T20:52:53Z

The only remaining thing that's not gated with the if-statement is the increment_version call.

What about the view handling via the view_as() calls? These are not generated for non-differentiable functions either but should still be executed when grad mode is disabled.

Hypothetically, increment_version for a tensor can be orthogonal to its differentiability.

But the differentiability impact the way we call the function (composite or not) and the way we call the function impact if we call increment_version or not.
So they are not really orthogonal?

ezyang · 2020-07-15T00:32:53Z

What about the view handling via the view_as() calls? These are not generated for non-differentiable functions either but should still be executed when grad mode is disabled.

Hm, yes, this is probably more long tail stuff we will have to handle.

albanD

Sounds good.

Please do add a note about BC-breaking changes this leads to in the main comment so that we can add it to the release doc.

ljk53 · 2020-07-15T17:25:47Z

Sounds good.

Please do add a note about BC-breaking changes this leads to in the main comment so that we can add it to the release doc.

Thanks for reviewing and approving this PR! I was actually using this PR as an experiment vehicle to trigger CIs so I didn't add any reviewers, guess I should have marked it as WIP, lol...

I might still experiment something else before landing it.

albanD · 2020-07-15T18:17:36Z

Thinking more about it, I would agree that increment should not be associated with the differentiability of a function.
Because even if a function is not relevant for gradient computation, it should not lead to silently wrong gradients by side effect.

The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) [ghstack-poisoned]

The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643) [ghstack-poisoned]

…iability" The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. One remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643) [ghstack-poisoned]

Pull Request resolved: #41269 The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. One remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. ghstack-source-id: 108289718 Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/)

…iability" The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. One remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643) [ghstack-poisoned]

Pull Request resolved: #41269 The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. One remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. ghstack-source-id: 108318746 Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/)

facebook-github-bot · 2020-07-24T00:21:46Z

This pull request has been merged in 01c406c.

ezyang requested a review from albanD July 14, 2020 20:07

ezyang reviewed Jul 14, 2020

View reviewed changes

albanD reviewed Jul 14, 2020

View reviewed changes

ezyang approved these changes Jul 15, 2020

View reviewed changes

albanD approved these changes Jul 15, 2020

View reviewed changes

albanD added the module: bc-breaking Related to a BC-breaking change label Jul 15, 2020

ljk53 mentioned this pull request Jul 16, 2020

[pytorch] bump variable version before redispatching #41522

Closed

ljk53 added 3 commits July 15, 2020 23:34

ljk53 changed the title ~~[pytorch] bump up version regardless of differentiability~~ [pytorch] bump up variable version regardless of differentiability Jul 16, 2020

facebook-github-bot closed this in 01c406c Jul 23, 2020

facebook-github-bot added the merged label Jul 24, 2020

facebook-github-bot deleted the gh/ljk53/156/head branch July 27, 2020 14:18

mruberry added the Merged label Oct 28, 2020

[pytorch] bump up variable version regardless of differentiability #41269

[pytorch] bump up variable version regardless of differentiability #41269

Uh oh!

Conversation

ljk53 commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build (1/1)

ci.pytorch.org: 1 failed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD commented Jul 14, 2020

Uh oh!

ezyang commented Jul 15, 2020

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

ljk53 commented Jul 15, 2020

Uh oh!

albanD commented Jul 15, 2020

Uh oh!

facebook-github-bot commented Jul 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ljk53 commented Jul 10, 2020 •

edited

Loading

dr-ci bot commented Jul 10, 2020 •

edited

Loading