Skip to content

Conversation

There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 12, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86802

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3628e59:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
robieta pushed a commit that referenced this pull request Oct 12, 2022
Pull Request resolved: #86802

There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.
ghstack-source-id: 170188444

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
@robieta
Copy link
Contributor Author

robieta commented Oct 12, 2022

In the interest of keeping code quality high I've opted memory profiler into mypy-strict

There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
robieta pushed a commit that referenced this pull request Oct 12, 2022
Pull Request resolved: #86802

There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.
ghstack-source-id: 170207209

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
robieta pushed a commit that referenced this pull request Oct 12, 2022
Pull Request resolved: #86802

There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.
ghstack-source-id: 170233186

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
Taylor Robie added 2 commits October 24, 2022 10:21
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
@robieta robieta added ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Oct 25, 2022
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the high level strategy here? What is the exact goal?
If you were recording while the user code was executed, why not just tag all the Tensors created during the backward pass as they get created?

children = node.children

# AccumulateGrad is used in the Autograd engine to handle gradient updates.
# There are two possible cases:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there are a few more:

// Given a variable with its current grad as variable_grad, accumulates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added aten::add for the double backward case. It's also worth noting that this doesn't have to be 100% foolproof. I mostly just want a reasonable fallback but I'm expecting most gradients to be scooped up by nn.Module instrumentation. If you think I've missed any important cases do let me know though.

There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
@robieta
Copy link
Contributor Author

robieta commented Oct 28, 2022

I'm not sure about the high level strategy here? What is the exact goal? If you were recording while the user code was executed, why not just tag all the Tensors created during the backward pass as they get created?

The goal is to distinguish between a proper gradient and a Tensor which is merely created as an implementation detail of an autograd formula. (Since they will generally have different lifetime constraints and implications on roofline memory.) If you recall, one question that came up from some of the plots in the prototype was "why is gradient memory so spiky?" and the answer was that I was tagging all tensors created in the backward pass as gradients rather than the more precise definition that I'm going for here. Does that seem reasonable?

@albanD
Copy link
Collaborator

albanD commented Oct 28, 2022

Ho that make sense. I think the name is a big confusing to me then because all these intermediary that get created are also gradients :p
The difference is that some of them are the ones requested by the user and some of them are just intermediary results. It might be good to make that distinction in the doc there.

@robieta
Copy link
Contributor Author

robieta commented Oct 28, 2022

Ho that make sense. I think the name is a big confusing to me then because all these intermediary that get created are also gradients :p The difference is that some of them are the ones requested by the user and some of them are just intermediary results. It might be good to make that distinction in the doc there.

I should note that I'm very much open to bikeshedding on the taxonomy (might even be a good composability topic...) and it's very possible that we will end up with a very fine grained internal representation and a more coarse grained set of user facing labels. (Or a switch to decide if some similar categories should be grouped.) Regardless, I'll definitely add a big 'ol docstring to the Category Enum.

Taylor Robie added 3 commits October 31, 2022 14:46
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)

[ghstack-poisoned]
@robieta
Copy link
Contributor Author

robieta commented Nov 8, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
…6802)

There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.

Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)
Pull Request resolved: pytorch#86802
Approved by: https://github.com/chaekit
@facebook-github-bot facebook-github-bot deleted the gh/robieta/134/head branch June 8, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: profiler release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants