Skip to content

Conversation

@soulitzer
Copy link
Contributor

@soulitzer soulitzer commented Sep 28, 2022

Stack from ghstack (oldest at bottom):

Addresses: #35802

Design doc: https://docs.google.com/document/d/19xSib7FFknRQ5f3ptGFUmiOt3BrgXSUlTQH2xMcZJYg/edit#

Changes in this PR

Implementation

  • We have now have 3 fields: pre_hooks, retains_grad_hooks, and tensor_pre_hooks so that we can more precisely define their ordering and when they are executed.
  • Since retains grad uses an entirely new field, we cannot reuse the old retains grad, logic. We refactor retains grad to call directly into the variable.cpp logic. Other logic in variable.cpp that handle cpp hooks must also be updated.

Hooks ordering and execution:

  • Defines pre-hooks registered on tensor to run before pre-hooks registered on grad_fn
  • Updates pre-hooks registered on tensor to always run, even if they are the inputs= to .grad()
  • Post hooks (and pre hooks) can now observe the modifications to gradient by the tensor pre hook

Retains grad hooks

  • retains grad hooks always execute last, even if there are other tensor pre-hooks registered

Unchanged:

  • pre_hooks registered to grad_fn aren't expected to execute if they are the inputs= to .grad()

Follow ups:

  • simplify retains_grad field to not be a vector, since it always holds a single hook
  • potentially merge capture hooks with tensor pre hooks, this would involve some additional refactoring since
  • python hooks registered to tensor behavior on in-place is still wrong

cc @ezyang @gchanan

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 28, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85849

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 07b732c:

FLAKY - The following jobs failed but were likely due to flakiness present on master:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

soulitzer added a commit that referenced this pull request Sep 28, 2022
@soulitzer soulitzer marked this pull request as draft September 28, 2022 21:07
@soulitzer soulitzer added module: bc-breaking Related to a BC-breaking change release notes: autograd release notes category topic: bc breaking topic category labels Sep 28, 2022
Notable points:
- With this PR PyFunctionTensorPreHooks registered to leaf tensor will not change the input you receive in the post hook, but I think that is just the status quo
- We'd like for pre hooks (but not post hooks) to be able to modify the gradients before they are captured (this replicates what happens when accumulate_grad=true.
- Hooks registered to tensors (PyFunctionTensorPreHooks as opposed to PyFunctionHooks(Pre|Post)Hooks) are not registered to the accumulate grad node. They are saved on the tensor's autograd_meta and accessed exclusively by AccumulateGrad::apply. 
   - I'm not actually sure why it is the case yet. Maybe we should fix it?

Alternative design:
- Actually set the needed_ of the outputs to true even when accumulate_grad=false, but also modify AccumulateGrad::apply to behave as a no-op when this happens. This seemed easier at first, but I ended up not doing this I wanted to avoid this TLS accesses.

[ghstack-poisoned]
Notable points:
- With this PR PyFunctionTensorPreHooks registered to leaf tensor will not change the input you receive in the post hook, but I think that is just the status quo
- We'd like for pre hooks (but not post hooks) to be able to modify the gradients before they are captured (this replicates what happens when accumulate_grad=true.
- Hooks registered to tensors (PyFunctionTensorPreHooks as opposed to PyFunctionHooks(Pre|Post)Hooks) are not registered to the accumulate grad node. They are saved on the tensor's autograd_meta and accessed exclusively by AccumulateGrad::apply. 
   - I'm not actually sure why it is the case yet. Maybe we should fix it?

Alternative design:
- Actually set the needed_ of the outputs to true even when accumulate_grad=false, but also modify AccumulateGrad::apply to behave as a no-op when this happens. This seemed easier at first, but I ended up not doing this I wanted to avoid this TLS accesses.

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Sep 29, 2022
Notable points:
- With this PR PyFunctionTensorPreHooks registered to leaf tensor will not change the input you receive in the post hook, but I think that is just the status quo
- We'd like for pre hooks (but not post hooks) to be able to modify the gradients before they are captured (this replicates what happens when accumulate_grad=true.
- Hooks registered to tensors (PyFunctionTensorPreHooks as opposed to PyFunctionHooks(Pre|Post)Hooks) are not registered to the accumulate grad node. They are saved on the tensor's autograd_meta and accessed exclusively by AccumulateGrad::apply. 
   - I'm not actually sure why it is the case yet. Maybe we should fix it?

Alternative design:
- Actually set the needed_ of the outputs to true even when accumulate_grad=false, but also modify AccumulateGrad::apply to behave as a no-op when this happens. This seemed easier at first, but I ended up not doing this I wanted to avoid this TLS accesses.

[ghstack-poisoned]
Notable points:
- With this PR PyFunctionTensorPreHooks registered to leaf tensor will not change the input you receive in the post hook, but I think that is just the status quo
- We'd like for pre hooks (but not post hooks) to be able to modify the gradients before they are captured (this replicates what happens when accumulate_grad=true.
- Hooks registered to tensors (PyFunctionTensorPreHooks as opposed to PyFunctionHooks(Pre|Post)Hooks) are not registered to the accumulate grad node. They are saved on the tensor's autograd_meta and accessed exclusively by AccumulateGrad::apply. 
   - I'm not actually sure why it is the case yet. Maybe we should fix it?

Alternative design:
- Actually set the needed_ of the outputs to true even when accumulate_grad=false, but also modify AccumulateGrad::apply to behave as a no-op when this happens. This seemed easier at first, but I ended up not doing this I wanted to avoid this TLS accesses.

[ghstack-poisoned]
Notable points:
- With this PR PyFunctionTensorPreHooks registered to leaf tensor will not change the input you receive in the post hook, but I think that is just the status quo
- We'd like for pre hooks (but not post hooks) to be able to modify the gradients before they are captured (this replicates what happens when accumulate_grad=true.
- Hooks registered to tensors (PyFunctionTensorPreHooks as opposed to PyFunctionHooks(Pre|Post)Hooks) are not registered to the accumulate grad node. They are saved on the tensor's autograd_meta and accessed exclusively by AccumulateGrad::apply. 
   - I'm not actually sure why it is the case yet. Maybe we should fix it?

Alternative design:
- Actually set the needed_ of the outputs to true even when accumulate_grad=false, but also modify AccumulateGrad::apply to behave as a no-op when this happens. This seemed easier at first, but I ended up not doing this I wanted to avoid this TLS accesses.

[ghstack-poisoned]
Notable points:
- With this PR PyFunctionTensorPreHooks registered to leaf tensor will not change the input you receive in the post hook, but I think that is just the status quo
- We'd like for pre hooks (but not post hooks) to be able to modify the gradients before they are captured (this replicates what happens when accumulate_grad=true.
- Hooks registered to tensors (PyFunctionTensorPreHooks as opposed to PyFunctionHooks(Pre|Post)Hooks) are not registered to the accumulate grad node. They are saved on the tensor's autograd_meta and accessed exclusively by AccumulateGrad::apply. 
   - I'm not actually sure why it is the case yet. Maybe we should fix it?

Alternative design:
- Actually set the needed_ of the outputs to true even when accumulate_grad=false, but also modify AccumulateGrad::apply to behave as a no-op when this happens. This seemed easier at first, but I ended up not doing this I wanted to avoid this TLS accesses.

[ghstack-poisoned]
soulitzer added a commit that referenced this pull request Sep 30, 2022
Notable points:
- With this PR PyFunctionTensorPreHooks registered to leaf tensor will not change the input you receive in the post hook, but I think that is just the status quo
- We'd like for pre hooks (but not post hooks) to be able to modify the gradients before they are captured (this replicates what happens when accumulate_grad=true.
- Hooks registered to tensors (PyFunctionTensorPreHooks as opposed to PyFunctionHooks(Pre|Post)Hooks) are not registered to the accumulate grad node. They are saved on the tensor's autograd_meta and accessed exclusively by AccumulateGrad::apply. 
   - I'm not actually sure why it is the case yet. Maybe we should fix it?

Alternative design:
- Actually set the needed_ of the outputs to true even when accumulate_grad=false, but also modify AccumulateGrad::apply to behave as a no-op when this happens. This seemed easier at first, but I ended up not doing this I wanted to avoid this TLS accesses.

[ghstack-poisoned]
@soulitzer soulitzer changed the title Update autograd.grad to call hooks registered to leaf nodes Update autograd.grad to call the next node's prehook before capture Sep 30, 2022
…e capture"


Notable points:
- With this PR PyFunctionTensorPreHooks registered to leaf tensor will not change the input you receive in the post hook, but I think that is just the status quo
- We'd like for pre hooks (but not post hooks) to be able to modify the gradients before they are captured (this replicates what happens when accumulate_grad=true.
- Hooks registered to tensors (PyFunctionTensorPreHooks as opposed to PyFunctionHooks(Pre|Post)Hooks) are not registered to the accumulate grad node. They are saved on the tensor's autograd_meta and accessed exclusively by AccumulateGrad::apply. 
   - I'm not actually sure why it is the case yet. Maybe we should fix it?

Alternative design:
- Actually set the needed_ of the outputs to true even when accumulate_grad=false, but also modify AccumulateGrad::apply to behave as a no-op when this happens. This seemed easier at first, but I ended up not doing this I wanted to avoid this TLS accesses.

[ghstack-poisoned]
…e capture"


Notable points:
- With this PR PyFunctionTensorPreHooks registered to leaf tensor will not change the input you receive in the post hook, but I think that is just the status quo
- We'd like for pre hooks (but not post hooks) to be able to modify the gradients before they are captured (this replicates what happens when accumulate_grad=true.
- Hooks registered to tensors (PyFunctionTensorPreHooks as opposed to PyFunctionHooks(Pre|Post)Hooks) are not registered to the accumulate grad node. They are saved on the tensor's autograd_meta and accessed exclusively by AccumulateGrad::apply. 
   - I'm not actually sure why it is the case yet. Maybe we should fix it?

Alternative design:
- Actually set the needed_ of the outputs to true even when accumulate_grad=false, but also modify AccumulateGrad::apply to behave as a no-op when this happens. This seemed easier at first, but I ended up not doing this I wanted to avoid this TLS accesses.

[ghstack-poisoned]
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.
This is BC-breaking though?

new_grad = (*hook)({new_grad})[0];
}
return new_grad;
std::vector<std::unique_ptr<FunctionPreHook>>& tensor_pre_hooks() noexcept
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have the same for retain_grad hook? Or is it assumed that leaf Tensors can't have retain_grad hooks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is ok to assume that.

// Returns the inputs as a list of variables. Destroys given InputBuffer.
static std::vector<Variable> variables(InputBuffer&& g);

private:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see if I can avoid this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can easily avoid this for now. It should be gone for sure though when we eventually remove the usage of captures hooks, so should just be a temporary hack.

var->mutable_grad() = var->grad() + grad;
}
}
return at::TensorBase{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the contract on the returned value here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the return value is defined, then it will modify the gradient; otherwise if it is not-defined, then this hook is considered as only "reading" the value of the gradient.

See:

if (!res.defined()) {
// Don't change gradient
continue;

// we will additionally register a single hook to the grad_fn.
//
// Note that the cpp and python use cases aren't actually aware of
// each other, so using both is not defined behavior.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen in practice before this PR? They will just run in arbitrary order?
What about after this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the python side the hooks_ field is cleared everytime backward_hooks field is set, and a PyFunctionHook is registered.
From the cpp side, hooks_ field is set whenever a cpp hook is registered AND cpp_hooks_list_ is empty. (In the case where we register a retains_grad hook that is also the case, but it shouldn't matter because hooks_ is only used on leaf nodes and retains_grad is no-op when invoked on leaf node)

What happens is that one will be overwritten by the other.

This is still the case after this PR, but I'm okay with not caring about mixed Python + cpp case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ho wow this is very badly broken :O good to know!

Addresses: #35802

Design doc: https://docs.google.com/document/d/19xSib7FFknRQ5f3ptGFUmiOt3BrgXSUlTQH2xMcZJYg/edit#

### Changes in this PR

#### Implementation
- We have now have 3 fields: pre_hooks, retains_grad_hooks, and tensor_pre_hooks so that we can more precisely define their ordering and when they are executed.
- Since retains grad uses an entirely new field, we cannot reuse the old retains grad, logic. We refactor retains grad to call directly into the variable.cpp logic. Other logic in variable.cpp that handle cpp hooks must also be updated. 

#### Hooks ordering and execution:
- Defines pre-hooks registered on tensor to run before pre-hooks registered on grad_fn
- Updates pre-hooks registered on tensor to always run, even if they are the inputs= to .grad()
- Post hooks (and pre hooks) can now observe the modifications to gradient by the tensor pre hook

#### Retains grad hooks
- retains grad hooks always execute last, even if there are other tensor pre-hooks registered

#### Unchanged:
- pre_hooks registered to grad_fn aren't expected to execute if they are the inputs= to .grad()

Follow ups:
- simplify retains_grad field to not be a vector, since it always holds a single hook
- potentially merge capture hooks with tensor pre hooks, this would involve some additional refactoring since 
- python hooks registered to tensor behavior on in-place is still wrong








[ghstack-poisoned]
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lint needs fixing but sounds good otherwise!

Addresses: #35802

Design doc: https://docs.google.com/document/d/19xSib7FFknRQ5f3ptGFUmiOt3BrgXSUlTQH2xMcZJYg/edit#

### Changes in this PR

#### Implementation
- We have now have 3 fields: pre_hooks, retains_grad_hooks, and tensor_pre_hooks so that we can more precisely define their ordering and when they are executed.
- Since retains grad uses an entirely new field, we cannot reuse the old retains grad, logic. We refactor retains grad to call directly into the variable.cpp logic. Other logic in variable.cpp that handle cpp hooks must also be updated. 

#### Hooks ordering and execution:
- Defines pre-hooks registered on tensor to run before pre-hooks registered on grad_fn
- Updates pre-hooks registered on tensor to always run, even if they are the inputs= to .grad()
- Post hooks (and pre hooks) can now observe the modifications to gradient by the tensor pre hook

#### Retains grad hooks
- retains grad hooks always execute last, even if there are other tensor pre-hooks registered

#### Unchanged:
- pre_hooks registered to grad_fn aren't expected to execute if they are the inputs= to .grad()

Follow ups:
- simplify retains_grad field to not be a vector, since it always holds a single hook
- potentially merge capture hooks with tensor pre hooks, this would involve some additional refactoring since 
- python hooks registered to tensor behavior on in-place is still wrong








[ghstack-poisoned]
@soulitzer soulitzer added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 11, 2023
Addresses: #35802

Design doc: https://docs.google.com/document/d/19xSib7FFknRQ5f3ptGFUmiOt3BrgXSUlTQH2xMcZJYg/edit#

### Changes in this PR

#### Implementation
- We have now have 3 fields: pre_hooks, retains_grad_hooks, and tensor_pre_hooks so that we can more precisely define their ordering and when they are executed.
- Since retains grad uses an entirely new field, we cannot reuse the old retains grad, logic. We refactor retains grad to call directly into the variable.cpp logic. Other logic in variable.cpp that handle cpp hooks must also be updated. 

#### Hooks ordering and execution:
- Defines pre-hooks registered on tensor to run before pre-hooks registered on grad_fn
- Updates pre-hooks registered on tensor to always run, even if they are the inputs= to .grad()
- Post hooks (and pre hooks) can now observe the modifications to gradient by the tensor pre hook

#### Retains grad hooks
- retains grad hooks always execute last, even if there are other tensor pre-hooks registered

#### Unchanged:
- pre_hooks registered to grad_fn aren't expected to execute if they are the inputs= to .grad()

Follow ups:
- simplify retains_grad field to not be a vector, since it always holds a single hook
- potentially merge capture hooks with tensor pre hooks, this would involve some additional refactoring since 
- python hooks registered to tensor behavior on in-place is still wrong








cc ezyang gchanan

[ghstack-poisoned]
@soulitzer
Copy link
Contributor Author

@pytorchbot merge -f "Unrelated failures"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

return pre_hooks_;
}

virtual std::vector<std::unique_ptr<FunctionPreHook>>&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this virtual be an error ? how different tensor_pre_hooks_ from retains_grad_hooks_ ?

Copy link
Contributor Author

@soulitzer soulitzer Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how different tensor_pre_hooks_ from retains_grad_hooks_ ?

there's a class that inherits from Node that overrides tensor_pre_hooks_

@albanD
Copy link
Collaborator

albanD commented Jan 18, 2023

@pytorchbot revert -h

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 18, 2023

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -m/--message, -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

@albanD
Copy link
Collaborator

albanD commented Jan 18, 2023

@pytorchbot revert -m "fails internal build" -c nosignal

ld.lld: error: undefined symbol: typeinfo for torch::autograd::FunctionPreHook

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@soulitzer your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Jan 18, 2023
This reverts commit 049838f.

Reverted #85849 on behalf of https://github.com/albanD due to fails internal build
soulitzer added a commit that referenced this pull request Jan 18, 2023
soulitzer added a commit that referenced this pull request Jan 19, 2023
pytorchmergebot pushed a commit that referenced this pull request Jan 19, 2023
This reverts commit e525f43.

Original PR:  #85849
Fixes #ISSUE_NUMBER

In addition to reverting the revert, this PR:
- defines the virtual destructor of FunctionPreHook in the header. Why? Presumably the internal build imports the header from somewhere, but does not have function_hooks.cpp (where the virtual destructor was previously defined) in the same compilation unit.

Pull Request resolved: #92559
Approved by: https://github.com/albanD
@facebook-github-bot facebook-github-bot deleted the gh/soulitzer/139/head branch June 8, 2023 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed Merged module: bc-breaking Related to a BC-breaking change release notes: autograd release notes category Reverted topic: bc breaking topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants