Skip to content

Conversation

@mikeiovine
Copy link

@mikeiovine mikeiovine commented Mar 30, 2022

Stack from ghstack (oldest at bottom):

It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this:

def forward(self, cond: bool, a, b):
    lst = []
    if cond:
        res = a + b # res should not be managed!!!
        lst.append(res)
    return lst

The if cond: sub-block returns nothing, but res escapes the scope through lst.

The fix is simple: we simply have to mark values that alias the wildcard set as an external_alias_ in ValueGroup.

This diff also exposed another issue (via unit tests) in checkOutputTensorMemoryLeaks: it assumes that, if a node's Value* is managed, the underlying IValue must be a tensor. But this is not true after the addition of to_maybe_copy_out; TMCO does not produce a tensor in its first output slot if it does not copy.

Differential Revision: D35257087

It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this:

```
def forward(self, cond: bool, a, b):
    lst = []
    if cond:
        res = a + b # res should not be managed!!!
        lst.append(res)
    return lst
```

The `if cond:` sub-block returns nothing, but `res` escapes the scope through `lst`.

The fix is simple: we simply have to mark values that alias the wildcard set as an `external_alias_` in `ValueGroup`.

This diff also exposed another issue (via unit tests) in `checkOutputTensorMemoryLeaks`: it assumes that, if a node's `Value*` is managed, the underlying `IValue` must be a tensor. But this is not true after the addition of `to_maybe_copy_out`; TMCO does not produce a tensor in its first output slot if it does not copy.

Differential Revision: [D35257087](https://our.internmc.facebook.com/intern/diff/D35257087/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 30, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 63b0e06 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added cla signed oncall: jit Add this issue/PR to JIT oncall triage queue labels Mar 30, 2022
mikeiovine pushed a commit that referenced this pull request Mar 30, 2022
It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this:

```
def forward(self, cond: bool, a, b):
    lst = []
    if cond:
        res = a + b # res should not be managed!!!
        lst.append(res)
    return lst
```

The `if cond:` sub-block returns nothing, but `res` escapes the scope through `lst`.

The fix is simple: we simply have to mark values that alias the wildcard set as an `external_alias_` in `ValueGroup`.

This diff also exposed another issue (via unit tests) in `checkOutputTensorMemoryLeaks`: it assumes that, if a node's `Value*` is managed, the underlying `IValue` must be a tensor. But this is not true after the addition of `to_maybe_copy_out`; TMCO does not produce a tensor in its first output slot if it does not copy.

Differential Revision: [D35257087](https://our.internmc.facebook.com/intern/diff/D35257087/)

ghstack-source-id: 152603154
Pull Request resolved: #74966
It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this:

```
def forward(self, cond: bool, a, b):
    lst = []
    if cond:
        res = a + b # res should not be managed!!!
        lst.append(res)
    return lst
```

The `if cond:` sub-block returns nothing, but `res` escapes the scope through `lst`.

The fix is simple: we simply have to mark values that alias the wildcard set as an `external_alias_` in `ValueGroup`.

This diff also exposed another issue (via unit tests) in `checkOutputTensorMemoryLeaks`: it assumes that, if a node's `Value*` is managed, the underlying `IValue` must be a tensor. But this is not true after the addition of `to_maybe_copy_out`; TMCO does not produce a tensor in its first output slot if it does not copy.

Differential Revision: [D35257087](https://our.internmc.facebook.com/intern/diff/D35257087/)

[ghstack-poisoned]
mikeiovine pushed a commit that referenced this pull request Apr 7, 2022
Pull Request resolved: #74966

It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this:

```
def forward(self, cond: bool, a, b):
    lst = []
    if cond:
        res = a + b # res should not be managed!!!
        lst.append(res)
    return lst
```

The `if cond:` sub-block returns nothing, but `res` escapes the scope through `lst`.

The fix is simple: we simply have to mark values that alias the wildcard set as an `external_alias_` in `ValueGroup`.

This diff also exposed another issue (via unit tests) in `checkOutputTensorMemoryLeaks`: it assumes that, if a node's `Value*` is managed, the underlying `IValue` must be a tensor. But this is not true after the addition of `to_maybe_copy_out`; TMCO does not produce a tensor in its first output slot if it does not copy.
ghstack-source-id: 153288188

Differential Revision: [D35257087](https://our.internmc.facebook.com/intern/diff/D35257087/)
facebook-github-bot pushed a commit that referenced this pull request Apr 7, 2022
Summary:
Pull Request resolved: #74966

It's clear that we don't want to manage tensors that escape their scope. Previously, we handled this by checking whether the tensor aliased the graph outputs. But there's actually another way to escape scope: by aliasing the wildcard set. The following graph demonstrates this:

```
def forward(self, cond: bool, a, b):
    lst = []
    if cond:
        res = a + b # res should not be managed!!!
        lst.append(res)
    return lst
```

The `if cond:` sub-block returns nothing, but `res` escapes the scope through `lst`.

The fix is simple: we simply have to mark values that alias the wildcard set as an `external_alias_` in `ValueGroup`.

This diff also exposed another issue (via unit tests) in `checkOutputTensorMemoryLeaks`: it assumes that, if a node's `Value*` is managed, the underlying `IValue` must be a tensor. But this is not true after the addition of `to_maybe_copy_out`; TMCO does not produce a tensor in its first output slot if it does not copy.
ghstack-source-id: 153288188

Test Plan: New unit tests cover the problematic case

Reviewed By: navahgar

Differential Revision: D35257087

fbshipit-source-id: 853a761dffe51f2c70720759664dd8dfcd56d1d7
@github-actions
Copy link
Contributor

github-actions bot commented Apr 7, 2022

Hey @mikeiovine.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants