[SR] Do not manage tensors that escape scope via container #74966

mikeiovine · 2022-03-30T18:23:52Z

Stack from ghstack (oldest at bottom):

-> [SR] Do not manage tensors that escape scope via container #74966

It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this:

def forward(self, cond: bool, a, b):
    lst = []
    if cond:
        res = a + b # res should not be managed!!!
        lst.append(res)
    return lst

The if cond: sub-block returns nothing, but res escapes the scope through lst.

The fix is simple: we simply have to mark values that alias the wildcard set as an external_alias_ in ValueGroup.

This diff also exposed another issue (via unit tests) in checkOutputTensorMemoryLeaks: it assumes that, if a node's Value* is managed, the underlying IValue must be a tensor. But this is not true after the addition of to_maybe_copy_out; TMCO does not produce a tensor in its first output slot if it does not copy.

Differential Revision: D35257087

It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this: ``` def forward(self, cond: bool, a, b): lst = [] if cond: res = a + b # res should not be managed!!! lst.append(res) return lst ``` The `if cond:` sub-block returns nothing, but `res` escapes the scope through `lst`. The fix is simple: we simply have to mark values that alias the wildcard set as an `external_alias_` in `ValueGroup`. This diff also exposed another issue (via unit tests) in `checkOutputTensorMemoryLeaks`: it assumes that, if a node's `Value*` is managed, the underlying `IValue` must be a tensor. But this is not true after the addition of `to_maybe_copy_out`; TMCO does not produce a tensor in its first output slot if it does not copy. Differential Revision: [D35257087](https://our.internmc.facebook.com/intern/diff/D35257087/) [ghstack-poisoned]

facebook-github-bot · 2022-03-30T18:23:55Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74966
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit 63b0e06 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this: ``` def forward(self, cond: bool, a, b): lst = [] if cond: res = a + b # res should not be managed!!! lst.append(res) return lst ``` The `if cond:` sub-block returns nothing, but `res` escapes the scope through `lst`. The fix is simple: we simply have to mark values that alias the wildcard set as an `external_alias_` in `ValueGroup`. This diff also exposed another issue (via unit tests) in `checkOutputTensorMemoryLeaks`: it assumes that, if a node's `Value*` is managed, the underlying `IValue` must be a tensor. But this is not true after the addition of `to_maybe_copy_out`; TMCO does not produce a tensor in its first output slot if it does not copy. Differential Revision: [D35257087](https://our.internmc.facebook.com/intern/diff/D35257087/) ghstack-source-id: 152603154 Pull Request resolved: #74966

It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this: ``` def forward(self, cond: bool, a, b): lst = [] if cond: res = a + b # res should not be managed!!! lst.append(res) return lst ``` The `if cond:` sub-block returns nothing, but `res` escapes the scope through `lst`. The fix is simple: we simply have to mark values that alias the wildcard set as an `external_alias_` in `ValueGroup`. This diff also exposed another issue (via unit tests) in `checkOutputTensorMemoryLeaks`: it assumes that, if a node's `Value*` is managed, the underlying `IValue` must be a tensor. But this is not true after the addition of `to_maybe_copy_out`; TMCO does not produce a tensor in its first output slot if it does not copy. Differential Revision: [D35257087](https://our.internmc.facebook.com/intern/diff/D35257087/) [ghstack-poisoned]

Pull Request resolved: #74966 It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this: ``` def forward(self, cond: bool, a, b): lst = [] if cond: res = a + b # res should not be managed!!! lst.append(res) return lst ``` The `if cond:` sub-block returns nothing, but `res` escapes the scope through `lst`. The fix is simple: we simply have to mark values that alias the wildcard set as an `external_alias_` in `ValueGroup`. This diff also exposed another issue (via unit tests) in `checkOutputTensorMemoryLeaks`: it assumes that, if a node's `Value*` is managed, the underlying `IValue` must be a tensor. But this is not true after the addition of `to_maybe_copy_out`; TMCO does not produce a tensor in its first output slot if it does not copy. ghstack-source-id: 153288188 Differential Revision: [D35257087](https://our.internmc.facebook.com/intern/diff/D35257087/)

Summary: Pull Request resolved: #74966 It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this: ``` def forward(self, cond: bool, a, b): lst = [] if cond: res = a + b # res should not be managed!!! lst.append(res) return lst ``` The `if cond:` sub-block returns nothing, but `res` escapes the scope through `lst`. The fix is simple: we simply have to mark values that alias the wildcard set as an `external_alias_` in `ValueGroup`. This diff also exposed another issue (via unit tests) in `checkOutputTensorMemoryLeaks`: it assumes that, if a node's `Value*` is managed, the underlying `IValue` must be a tensor. But this is not true after the addition of `to_maybe_copy_out`; TMCO does not produce a tensor in its first output slot if it does not copy. ghstack-source-id: 153288188 Test Plan: New unit tests cover the problematic case Reviewed By: navahgar Differential Revision: D35257087 fbshipit-source-id: 853a761dffe51f2c70720759664dd8dfcd56d1d7

github-actions · 2022-04-07T19:58:33Z

Hey @mikeiovine.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot added cla signed oncall: jit Add this issue/PR to JIT oncall triage queue labels Mar 30, 2022

navahgar approved these changes Apr 7, 2022

View reviewed changes

pytorchmergebot closed this in 2f98fa9 Apr 7, 2022

facebook-github-bot deleted the gh/mikeiovine/48/head branch April 11, 2022 14:17

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SR] Do not manage tensors that escape scope via container #74966

[SR] Do not manage tensors that escape scope via container #74966

Uh oh!

mikeiovine commented Mar 30, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Mar 30, 2022 •

edited

Loading

Uh oh!

github-actions bot commented Apr 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SR] Do not manage tensors that escape scope via container #74966

[SR] Do not manage tensors that escape scope via container #74966

Uh oh!

Conversation

mikeiovine commented Mar 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Mar 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

github-actions bot commented Apr 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mikeiovine commented Mar 30, 2022 •

edited

Loading

facebook-github-bot commented Mar 30, 2022 •

edited

Loading