-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[pytorch] bump up variable version regardless of differentiability #41269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) [ghstack-poisoned]
The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) ghstack-source-id: 107497069 Pull Request resolved: #41269
💊 CI failures summary and remediationsAs of commit 3f32b4e (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) [ghstack-poisoned]
Pull Request resolved: #41269 The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. ghstack-source-id: 107549144 Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/)
The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) [ghstack-poisoned]
Pull Request resolved: #41269 The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. ghstack-source-id: 107710481 Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/)
|
|
||
| body.append(emit_call(env, tie_return_values)) | ||
| if strategy == 'use_derived': | ||
| body.extend(emit_increment_version()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the idea here is, if the op in question is composite, we DON'T emit the increment version (so that the constituent pieces can take care of it)
| if not modifies_arguments: | ||
| return [] | ||
| return ['increment_version({});'.format(arg['name']) for arg in differentiable_outputs] | ||
| return ['increment_version({});'.format(arg['name']) for arg in returns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this changed? does it actually matter?
Do we have inplace ops that return multiple things? And if so do we have some that return a mix of differentiable/non-differentiable outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should increment version on all tensors that get mutated, not just the differentiable ones. You can save non differentiable tensors as part of backwards formula...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I don't expect that these other outputs are actually modified inplace! We would be bumping the version of a Tensor we don't change inplace.
Also this is BC-breaking right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this changed? does it actually matter?
Do we have inplace ops that return multiple things? And if so do we have some that return a mix of differentiable/non-differentiable outputs?
There are cases where it returns multiple things mixed of differentiable/non-differentiable outputs, e.g.:
ljk53@9ab9e4b#diff-79b1a31c97eee8dda9e0dae02162beecR2986
non-differentiable outputs are usually things like indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ho right these out=(val, ind) functions...
Ok!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I don't expect that these other outputs are actually modified inplace! We would be bumping the version of a Tensor we don't change inplace.
Also this is BC-breaking right?
This is a good point - just to clarify: regarding BC breakage, do you mean:
- code that uses
x._versionto read variable version explicitly, or - code broken by falsely incrementing version for tensors that are not actually updated?
I call out "falsely" bumping because it's harmful as the breakage is totally unnecessary compared to truly bumping - in which case it is technically more correct behavior - if truly bumping breaks any code then it probably reveals bugs.
Seems we made effort to differentiate Tensor& and const Tensor& and seems returns at this place only contains Tensor& ones - so I assumed that these params are possibly mutated and bumped their versions. By eyeballing check seems in most cases the returns are indeed mutated, but I haven't verified all cases - do you know whether is it right assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code that uses x._version to read variable version explicitly
This one is ok to break as this is an internal API. You might need to update a couple tests but it should be fine overall (even though we want to avoid breaking it often).
code broken by falsely incrementing version for tensors that are not actually updated?
There are two cases here indeed:
- Cases where we were not incrementing while we should:
import torch
from torch.utils import checkpoint
a = torch.ones(10, 10, requires_grad=True)
b, ind = a.max(dim=0)
with torch.no_grad():
if False:
ind += 1 # Raise an error as expected
elif True:
# No error and wrong grad
t = torch.zeros(10)
t[2] = 1
torch.cummax(t, dim=0, out=(torch.Tensor(), ind))
else:
pass
b.sum().backward()
print(a.grad)- Cases where we should not increment as the second result is not modified inplace. Not sure if this happens in practice. In any case, we should be able to tell from the signature if it is modified or not (the signature is (should) always right!).
What about the view handling via the
But the differentiability impact the way we call the function (composite or not) and the way we call the function impact if we call increment_version or not. |
Hm, yes, this is probably more long tail stuff we will have to handle. |
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
Please do add a note about BC-breaking changes this leads to in the main comment so that we can add it to the release doc.
Thanks for reviewing and approving this PR! I was actually using this PR as an experiment vehicle to trigger CIs so I didn't add any reviewers, guess I should have marked it as WIP, lol... I might still experiment something else before landing it. |
|
Thinking more about it, I would agree that increment should not be associated with the differentiability of a function. |
The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) [ghstack-poisoned]
The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. The only remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643) [ghstack-poisoned]
…iability" The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. One remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643) [ghstack-poisoned]
…iability" The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. One remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643) [ghstack-poisoned]
Pull Request resolved: #41269 The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. One remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. ghstack-source-id: 108289718 Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/)
…iability" The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. One remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/) Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643) [ghstack-poisoned]
Pull Request resolved: #41269 The ultimate goal is to move things that are not gated with `if (compute_requires_grad(...))` or `if (grad_fn)` out from VariableType so that VariableType kernels can be enabled/disabled based upon `GradMode`. Then we can merge `AutoNonVariableTypeMode` and `NoGradGuard`. We've moved profiling / tracing logic out from VariableType. One remaining thing that's not gated with the if-statement is the `increment_version` call. However, the `gen_variable_type.py` does use bits from `derivatives.yaml` to determine whether to emit the `increment_version` call. If an output is never going to be differentiable (not based upon runtime property of the variable but based upon static property, e.g. it's integral type) then it would never emit the increment_version call. Hypothetically, increment_version for a tensor can be orthogonal to its differentiability. This PR is to make the change and test its impact. Making this logical simplification would allow us to move this out from VariableType to aten codegen. ghstack-source-id: 108318746 Differential Revision: [D22471643](https://our.internmc.facebook.com/intern/diff/D22471643/)
|
This pull request has been merged in 01c406c. |
Stack from ghstack:
The ultimate goal is to move things that are not gated with
if (compute_requires_grad(...))or
if (grad_fn)out from VariableType so that VariableType kernels can be enabled/disabledbased upon
GradMode. Then we can mergeAutoNonVariableTypeModeandNoGradGuard.We've moved profiling / tracing logic out from VariableType. One remaining thing that's
not gated with the if-statement is the
increment_versioncall.However, the
gen_variable_type.pydoes use bits fromderivatives.yamlto determine whetherto emit the
increment_versioncall. If an output is never going to be differentiable (not basedupon runtime property of the variable but based upon static property, e.g. it's integral type)
then it would never emit the increment_version call.
Hypothetically, increment_version for a tensor can be orthogonal to its differentiability.
This PR is to make the change and test its impact. Making this logical simplification would
allow us to move this out from VariableType to aten codegen.
Differential Revision: D22471643
Differential Revision: D22471643