Improve hooks ordering behavior #85849

soulitzer · 2022-09-28T21:04:15Z

Stack from ghstack (oldest at bottom):

Addresses: #35802

Design doc: https://docs.google.com/document/d/19xSib7FFknRQ5f3ptGFUmiOt3BrgXSUlTQH2xMcZJYg/edit#

Changes in this PR

Implementation

We have now have 3 fields: pre_hooks, retains_grad_hooks, and tensor_pre_hooks so that we can more precisely define their ordering and when they are executed.
Since retains grad uses an entirely new field, we cannot reuse the old retains grad, logic. We refactor retains grad to call directly into the variable.cpp logic. Other logic in variable.cpp that handle cpp hooks must also be updated.

Hooks ordering and execution:

Defines pre-hooks registered on tensor to run before pre-hooks registered on grad_fn
Updates pre-hooks registered on tensor to always run, even if they are the inputs= to .grad()
Post hooks (and pre hooks) can now observe the modifications to gradient by the tensor pre hook

Retains grad hooks

retains grad hooks always execute last, even if there are other tensor pre-hooks registered

Unchanged:

pre_hooks registered to grad_fn aren't expected to execute if they are the inputs= to .grad()

Follow ups:

simplify retains_grad field to not be a vector, since it always holds a single hook
potentially merge capture hooks with tensor pre hooks, this would involve some additional refactoring since
python hooks registered to tensor behavior on in-place is still wrong

cc @ezyang @gchanan

[ghstack-poisoned]

pytorch-bot · 2022-09-28T21:04:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85849

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 07b732c:

FLAKY - The following jobs failed but were likely due to flakiness present on master:

linux-focal-rocm5.3-py3.8 / test (default, 2, 2, linux.rocm.gpu)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: a10e082 Pull Request resolved: #85849

Notable points: - With this PR PyFunctionTensorPreHooks registered to leaf tensor will not change the input you receive in the post hook, but I think that is just the status quo - We'd like for pre hooks (but not post hooks) to be able to modify the gradients before they are captured (this replicates what happens when accumulate_grad=true. - Hooks registered to tensors (PyFunctionTensorPreHooks as opposed to PyFunctionHooks(Pre|Post)Hooks) are not registered to the accumulate grad node. They are saved on the tensor's autograd_meta and accessed exclusively by AccumulateGrad::apply. - I'm not actually sure why it is the case yet. Maybe we should fix it? Alternative design: - Actually set the needed_ of the outputs to true even when accumulate_grad=false, but also modify AccumulateGrad::apply to behave as a no-op when this happens. This seemed easier at first, but I ended up not doing this I wanted to avoid this TLS accesses. [ghstack-poisoned]

ghstack-source-id: dd0d6dd Pull Request resolved: #85849

Notable points: - With this PR PyFunctionTensorPreHooks registered to leaf tensor will not change the input you receive in the post hook, but I think that is just the status quo - We'd like for pre hooks (but not post hooks) to be able to modify the gradients before they are captured (this replicates what happens when accumulate_grad=true. - Hooks registered to tensors (PyFunctionTensorPreHooks as opposed to PyFunctionHooks(Pre|Post)Hooks) are not registered to the accumulate grad node. They are saved on the tensor's autograd_meta and accessed exclusively by AccumulateGrad::apply. - I'm not actually sure why it is the case yet. Maybe we should fix it? Alternative design: - Actually set the needed_ of the outputs to true even when accumulate_grad=false, but also modify AccumulateGrad::apply to behave as a no-op when this happens. This seemed easier at first, but I ended up not doing this I wanted to avoid this TLS accesses. [ghstack-poisoned]

ghstack-source-id: a027622 Pull Request resolved: #85849

Notable points: - With this PR PyFunctionTensorPreHooks registered to leaf tensor will not change the input you receive in the post hook, but I think that is just the status quo - We'd like for pre hooks (but not post hooks) to be able to modify the gradients before they are captured (this replicates what happens when accumulate_grad=true. - Hooks registered to tensors (PyFunctionTensorPreHooks as opposed to PyFunctionHooks(Pre|Post)Hooks) are not registered to the accumulate grad node. They are saved on the tensor's autograd_meta and accessed exclusively by AccumulateGrad::apply. - I'm not actually sure why it is the case yet. Maybe we should fix it? Alternative design: - Actually set the needed_ of the outputs to true even when accumulate_grad=false, but also modify AccumulateGrad::apply to behave as a no-op when this happens. This seemed easier at first, but I ended up not doing this I wanted to avoid this TLS accesses. [ghstack-poisoned]

…e capture" Notable points: - With this PR PyFunctionTensorPreHooks registered to leaf tensor will not change the input you receive in the post hook, but I think that is just the status quo - We'd like for pre hooks (but not post hooks) to be able to modify the gradients before they are captured (this replicates what happens when accumulate_grad=true. - Hooks registered to tensors (PyFunctionTensorPreHooks as opposed to PyFunctionHooks(Pre|Post)Hooks) are not registered to the accumulate grad node. They are saved on the tensor's autograd_meta and accessed exclusively by AccumulateGrad::apply. - I'm not actually sure why it is the case yet. Maybe we should fix it? Alternative design: - Actually set the needed_ of the outputs to true even when accumulate_grad=false, but also modify AccumulateGrad::apply to behave as a no-op when this happens. This seemed easier at first, but I ended up not doing this I wanted to avoid this TLS accesses. [ghstack-poisoned]

albanD

Sounds good.
This is BC-breaking though?

torch/csrc/autograd/engine.cpp

torch/csrc/autograd/functions/accumulate_grad.h

albanD · 2023-01-09T16:24:18Z

torch/csrc/autograd/functions/accumulate_grad.h

-      new_grad = (*hook)({new_grad})[0];
-    }
-    return new_grad;
+  std::vector<std::unique_ptr<FunctionPreHook>>& tensor_pre_hooks() noexcept


Shouldn't we have the same for retain_grad hook? Or is it assumed that leaf Tensors can't have retain_grad hooks?

Yes, I think it is ok to assume that.

albanD · 2023-01-09T16:24:33Z

torch/csrc/autograd/input_buffer.h

  // Returns the inputs as a list of variables. Destroys given InputBuffer.
  static std::vector<Variable> variables(InputBuffer&& g);

- private:


Let me see if I can avoid this.

I don't think we can easily avoid this for now. It should be gone for sure though when we eventually remove the usage of captures hooks, so should just be a temporary hack.

torch/csrc/autograd/python_cpp_function.cpp

torch/csrc/autograd/variable.cpp

albanD · 2023-01-09T16:38:37Z

torch/csrc/autograd/variable.cpp

        var->mutable_grad() = var->grad() + grad;
      }
    }
+    return at::TensorBase{};


What is the contract on the returned value here?

If the return value is defined, then it will modify the gradient; otherwise if it is not-defined, then this hook is considered as only "reading" the value of the gradient.

See:

pytorch/torch/csrc/autograd/cpp_hook.cpp

Lines 38 to 40 in 645fb21

if (!res.defined()) {

// Don't change gradient

continue;

albanD · 2023-01-09T16:39:49Z

torch/csrc/autograd/variable.h

+  // we will additionally register a single hook to the grad_fn.
+  //
+  // Note that the cpp and python use cases aren't actually aware of
+  // each other, so using both is not defined behavior.


What would happen in practice before this PR? They will just run in arbitrary order?
What about after this PR?

From the python side the hooks_ field is cleared everytime backward_hooks field is set, and a PyFunctionHook is registered.
From the cpp side, hooks_ field is set whenever a cpp hook is registered AND cpp_hooks_list_ is empty. (In the case where we register a retains_grad hook that is also the case, but it shouldn't matter because hooks_ is only used on leaf nodes and retains_grad is no-op when invoked on leaf node)

What happens is that one will be overwritten by the other.

This is still the case after this PR, but I'm okay with not caring about mixed Python + cpp case.

Ho wow this is very badly broken :O good to know!

test/test_autograd.py

Addresses: #35802 Design doc: https://docs.google.com/document/d/19xSib7FFknRQ5f3ptGFUmiOt3BrgXSUlTQH2xMcZJYg/edit# ### Changes in this PR #### Implementation - We have now have 3 fields: pre_hooks, retains_grad_hooks, and tensor_pre_hooks so that we can more precisely define their ordering and when they are executed. - Since retains grad uses an entirely new field, we cannot reuse the old retains grad, logic. We refactor retains grad to call directly into the variable.cpp logic. Other logic in variable.cpp that handle cpp hooks must also be updated. #### Hooks ordering and execution: - Defines pre-hooks registered on tensor to run before pre-hooks registered on grad_fn - Updates pre-hooks registered on tensor to always run, even if they are the inputs= to .grad() - Post hooks (and pre hooks) can now observe the modifications to gradient by the tensor pre hook #### Retains grad hooks - retains grad hooks always execute last, even if there are other tensor pre-hooks registered #### Unchanged: - pre_hooks registered to grad_fn aren't expected to execute if they are the inputs= to .grad() Follow ups: - simplify retains_grad field to not be a vector, since it always holds a single hook - potentially merge capture hooks with tensor pre hooks, this would involve some additional refactoring since - python hooks registered to tensor behavior on in-place is still wrong [ghstack-poisoned]

albanD

lint needs fixing but sounds good otherwise!

Addresses: #35802 Design doc: https://docs.google.com/document/d/19xSib7FFknRQ5f3ptGFUmiOt3BrgXSUlTQH2xMcZJYg/edit# ### Changes in this PR #### Implementation - We have now have 3 fields: pre_hooks, retains_grad_hooks, and tensor_pre_hooks so that we can more precisely define their ordering and when they are executed. - Since retains grad uses an entirely new field, we cannot reuse the old retains grad, logic. We refactor retains grad to call directly into the variable.cpp logic. Other logic in variable.cpp that handle cpp hooks must also be updated. #### Hooks ordering and execution: - Defines pre-hooks registered on tensor to run before pre-hooks registered on grad_fn - Updates pre-hooks registered on tensor to always run, even if they are the inputs= to .grad() - Post hooks (and pre hooks) can now observe the modifications to gradient by the tensor pre hook #### Retains grad hooks - retains grad hooks always execute last, even if there are other tensor pre-hooks registered #### Unchanged: - pre_hooks registered to grad_fn aren't expected to execute if they are the inputs= to .grad() Follow ups: - simplify retains_grad field to not be a vector, since it always holds a single hook - potentially merge capture hooks with tensor pre hooks, this would involve some additional refactoring since - python hooks registered to tensor behavior on in-place is still wrong [ghstack-poisoned]

Addresses: #35802 Design doc: https://docs.google.com/document/d/19xSib7FFknRQ5f3ptGFUmiOt3BrgXSUlTQH2xMcZJYg/edit# ### Changes in this PR #### Implementation - We have now have 3 fields: pre_hooks, retains_grad_hooks, and tensor_pre_hooks so that we can more precisely define their ordering and when they are executed. - Since retains grad uses an entirely new field, we cannot reuse the old retains grad, logic. We refactor retains grad to call directly into the variable.cpp logic. Other logic in variable.cpp that handle cpp hooks must also be updated. #### Hooks ordering and execution: - Defines pre-hooks registered on tensor to run before pre-hooks registered on grad_fn - Updates pre-hooks registered on tensor to always run, even if they are the inputs= to .grad() - Post hooks (and pre hooks) can now observe the modifications to gradient by the tensor pre hook #### Retains grad hooks - retains grad hooks always execute last, even if there are other tensor pre-hooks registered #### Unchanged: - pre_hooks registered to grad_fn aren't expected to execute if they are the inputs= to .grad() Follow ups: - simplify retains_grad field to not be a vector, since it always holds a single hook - potentially merge capture hooks with tensor pre hooks, this would involve some additional refactoring since - python hooks registered to tensor behavior on in-place is still wrong cc ezyang gchanan [ghstack-poisoned]

soulitzer · 2023-01-17T16:21:42Z

@pytorchbot merge -f "Unrelated failures"

pytorchmergebot · 2023-01-17T16:23:17Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

atalman · 2023-01-18T15:16:19Z

torch/csrc/autograd/function.h

    return pre_hooks_;
  }

+  virtual std::vector<std::unique_ptr<FunctionPreHook>>&


Could this virtual be an error ? how different tensor_pre_hooks_ from retains_grad_hooks_ ?

how different tensor_pre_hooks_ from retains_grad_hooks_ ?

there's a class that inherits from Node that overrides tensor_pre_hooks_

albanD · 2023-01-18T15:24:22Z

@pytorchbot revert -h

pytorch-bot · 2023-01-18T15:24:24Z

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -m/--message, -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

albanD · 2023-01-18T15:25:35Z

@pytorchbot revert -m "fails internal build" -c nosignal

ld.lld: error: undefined symbol: typeinfo for torch::autograd::FunctionPreHook

pytorchmergebot · 2023-01-18T15:27:17Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-01-18T15:27:27Z

@soulitzer your PR has been successfully reverted.

This reverts commit 049838f. Reverted #85849 on behalf of https://github.com/albanD due to fails internal build

This reverts commit e525f43. [ghstack-poisoned]

This reverts commit e525f43. Original PR: #85849 Fixes #ISSUE_NUMBER In addition to reverting the revert, this PR: - defines the virtual destructor of FunctionPreHook in the header. Why? Presumably the internal build imports the header from somewhere, but does not have function_hooks.cpp (where the virtual destructor was previously defined) in the same compilation unit. Pull Request resolved: #92559 Approved by: https://github.com/albanD

Update autograd.grad to call hooks registered to leaf nodes

a5fb397

[ghstack-poisoned]

soulitzer requested a review from albanD as a code owner September 28, 2022 21:04

soulitzer mentioned this pull request Sep 28, 2022

Require bias to be contiguous for depthwise3x3_winograd backend #85711

Closed

soulitzer mentioned this pull request Sep 28, 2022

Add context manager to allow mutation on saved tensors #79056

Closed

facebook-github-bot added the cla signed label Sep 28, 2022

soulitzer added a commit that referenced this pull request Sep 28, 2022

Update autograd.grad to call hooks registered to leaf nodes

1ba0afa

ghstack-source-id: a10e082 Pull Request resolved: #85849

soulitzer marked this pull request as draft September 28, 2022 21:07

soulitzer added module: bc-breaking Related to a BC-breaking change release notes: autograd release notes category topic: bc breaking topic category labels Sep 28, 2022

soulitzer added a commit that referenced this pull request Sep 29, 2022

Update autograd.grad to call hooks registered to leaf nodes

4c86739

ghstack-source-id: dd0d6dd Pull Request resolved: #85849

soulitzer added a commit that referenced this pull request Sep 30, 2022

Update autograd.grad to call hooks registered to leaf nodes

8269cfe

ghstack-source-id: a027622 Pull Request resolved: #85849

soulitzer changed the title ~~Update autograd.grad to call hooks registered to leaf nodes~~ Update autograd.grad to call the next node's prehook before capture Sep 30, 2022

soulitzer mentioned this pull request Sep 30, 2022

Lazily load decompositions for jvp #85989

Closed

soulitzer mentioned this pull request Jan 3, 2023

Document hooks ordering behavior in the autograd note #91667

Closed

albanD reviewed Jan 9, 2023

View reviewed changes

soulitzer mentioned this pull request Jan 11, 2023

Fix naming mixup in grad_fn posthooks #91984

Closed

albanD approved these changes Jan 11, 2023

View reviewed changes

soulitzer mentioned this pull request Jan 11, 2023

[BE] Don't use a vector to hold retains_grad hooks #92008

Closed

soulitzer added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 11, 2023

pytorchmergebot added the Merged label Jan 17, 2023

pytorchmergebot closed this in 049838f Jan 17, 2023

soulitzer mentioned this pull request Jan 17, 2023

Make aot_autograd explicitly error when double backward #92348

Closed

atalman reviewed Jan 18, 2023

View reviewed changes

pytorchmergebot added a commit that referenced this pull request Jan 18, 2023

Revert "Improve hooks ordering behavior (#85849)"

e525f43

This reverts commit 049838f. Reverted #85849 on behalf of https://github.com/albanD due to fails internal build

pytorchmergebot added the Reverted label Jan 18, 2023

soulitzer added a commit that referenced this pull request Jan 18, 2023

Revert "Revert "Improve hooks ordering behavior (#85849)""

f5edc6d

This reverts commit e525f43. [ghstack-poisoned]

This was referenced Jan 18, 2023

[reland] Improve hooks ordering behavior #92556

Closed

[reland] Improve hooks ordering behavior #92559

Closed

soulitzer added a commit that referenced this pull request Jan 19, 2023

Revert "Revert "Improve hooks ordering behavior (#85849)""

26b7c66

This reverts commit e525f43. [ghstack-poisoned]

ronghanghu mentioned this pull request Mar 8, 2023

[FSDP] Replace all .data usages with modern approaches pytorch/xla#4485

Merged

facebook-github-bot deleted the gh/soulitzer/139/head branch June 8, 2023 18:46

Improve hooks ordering behavior #85849

Improve hooks ordering behavior #85849

Uh oh!

Conversation

soulitzer commented Sep 28, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes in this PR

Implementation

Hooks ordering and execution:

Retains grad hooks

Unchanged:

Uh oh!

pytorch-bot bot commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85849

❌ 1 Failures

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

soulitzer commented Jan 17, 2023

Uh oh!

pytorchmergebot commented Jan 17, 2023

Merge started

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer Jan 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD commented Jan 18, 2023

Uh oh!

pytorch-bot bot commented Jan 18, 2023

Uh oh!

albanD commented Jan 18, 2023

Uh oh!

pytorchmergebot commented Jan 18, 2023

Uh oh!

pytorchmergebot commented Jan 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

soulitzer commented Sep 28, 2022 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 28, 2022 •

edited

Loading

soulitzer Jan 18, 2023 •

edited

Loading