-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[Profiler] Memory profiler part 1: Gradient identification #86802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86802
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 3628e59: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
Pull Request resolved: #86802 There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. ghstack-source-id: 170188444 Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
|
In the interest of keeping code quality high I've opted memory profiler into |
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
Pull Request resolved: #86802 There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. ghstack-source-id: 170207209 Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
Pull Request resolved: #86802 There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. ghstack-source-id: 170233186 Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/)
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the high level strategy here? What is the exact goal?
If you were recording while the user code was executed, why not just tag all the Tensors created during the backward pass as they get created?
| children = node.children | ||
|
|
||
| # AccumulateGrad is used in the Autograd engine to handle gradient updates. | ||
| # There are two possible cases: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there are a few more:
| // Given a variable with its current grad as variable_grad, accumulates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added aten::add for the double backward case. It's also worth noting that this doesn't have to be 100% foolproof. I mostly just want a reasonable fallback but I'm expecting most gradients to be scooped up by nn.Module instrumentation. If you think I've missed any important cases do let me know though.
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
The goal is to distinguish between a proper gradient and a Tensor which is merely created as an implementation detail of an autograd formula. (Since they will generally have different lifetime constraints and implications on roofline memory.) If you recall, one question that came up from some of the plots in the prototype was "why is gradient memory so spiky?" and the answer was that I was tagging all tensors created in the backward pass as gradients rather than the more precise definition that I'm going for here. Does that seem reasonable? |
|
Ho that make sense. I think the name is a big confusing to me then because all these intermediary that get created are also gradients :p |
I should note that I'm very much open to bikeshedding on the taxonomy (might even be a good composability topic...) and it's very possible that we will end up with a very fine grained internal representation and a more coarse grained set of user facing labels. (Or a switch to decide if some similar categories should be grouped.) Regardless, I'll definitely add a big 'ol docstring to the |
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…6802) There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination. Differential Revision: [D39920730](https://our.internmc.facebook.com/intern/diff/D39920730/) Pull Request resolved: pytorch#86802 Approved by: https://github.com/chaekit
Stack from ghstack (oldest at bottom):
There are multiple ways to indentify that a Tensor is a gradient. (A subset of which also give additional context.) So to start off I've made a utility to handle that determination.
Differential Revision: D39920730