AOT Autograd refactor + cleanup, handle intermediate views of bases, use view replay, fix non-tensor input handling #89532

bdhirsh · 2022-11-22T23:12:27Z

This PR is a pretty large refactor of the AOT Autograd logic, to clean things up + fix a few more broken edge cases. The changes are roughly:  

(1) (largest change) - for outputs of the fw that alias in some way, we used to not return them in the fw graph, and instead return a long tuple of ints, corresponding to the metadata of the outputs’ sizes/strides/storage_offset. The wrapper around the CompiledFunction.forward() would then figure out how to regenerate the alias with a big .as_strided call, indexing into the giant tuple of ints from the fwd graph to get the size/stride metadata.

Now instead, the compiled forward graph returns the actual aliased tensor outputs along with every other output, and the wrapper uses that output to regenerate the alias. 
Even though the aliased outputs are now returned in the compiled graph (and in the CompiledFunction.forward()), I explicitly removed them from the backward graph. That felt like the right call (the aliases shouldn’t participate in the compiled backward, because we don’t actually care about them - we just use them to regenerate the “real” aliases in the forward, but I’m open to other ideas. Doing this required the following:  (a) I updated the CompiledFunction.forward() to wrap all aliased outputs into an opaque TensorAlias wrapper object, so that the autograd.Function will know to not assign gradients to them (b) I updated CompiledFunction.backward() to filter out the grad_outputs that correspond to aliased outputs (which I assert are all None)
(c) Before tracing the joint_forward_backward(), I update tangents to filter out tangents corresponding to aliased outputs

(3) Cleaned up the metadata and removed some redundant info. Take a look at the ViewAndMutationMetadata class

(4) Precompute more things so that the hot path code should be faster. For example, when applying mutations back to mutated inputs, we used to loop through all inputs. Now, we precompute the indices of inputs that need to be mutated and only loop through those. This should be a meaningful speedup, since many models get graphs with 200+ inputs, and only a handful need mutations

(5) Added support for graphs with outputs that alias intermediates. This should fix a bug that has shown up on multiple models in the benchmark suite, where a graph returns an output that aliases an intermediate, and later tries to mutate that output (there’s are a few gh issues for this that I tried to find but couldn’t)

The way I handled this is that I check the ._base attribute of every output of the forward. Any ._base’s that don’t already exist as other outputs are then added as extra outputs to the graph They also get their own metadata slots in ViewAndMutationMetadata.output_info (which is not strictly necessary, but made handling them easier). I then also tag every output with a ._base as having OutputType.alias_of_intermediate. In the wrapper around the CompiledFunction.forward(), for every output that is an alias of an intermediate, I discard that output and regenerate it off its intermediate.

(6) We now use Alban’s view-replay logic, instead of always doing .as_strided(). This is notably best effort, and still falls back to as_strided() in many cases. In particular: in the synthetic base tests, the aliased inputs are created in eager mode, so they are forced to always replay with .as_strided().

(7) Fixed non-tensor input handling. This was also breaking an internal test (cc @albanD @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire @mlazos @yanboliang @chunyuan-w @aazzolini). I confirmed the invariant we have inside of aot_dispatch_deduplicated_autograd is that we are given a flattened list of inputs (flattened by pytrees), but we are not guaranteed that the inputs are tensor-only.

(8) I think I responded to and fixed any other relevant PR feedback from the original PR.

Stack from ghstack (oldest at bottom):

-> AOT Autograd refactor + cleanup, handle intermediate views of bases, use view replay, fix non-tensor input handling #89532

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

[ghstack-poisoned]

pytorch-bot · 2022-11-22T23:12:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89532

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7ba150f:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: a83fedf Pull Request resolved: #89532

[ghstack-poisoned]

ghstack-source-id: b05f6fe Pull Request resolved: #89532

[ghstack-poisoned]

ghstack-source-id: 750a341 Pull Request resolved: #89532

[ghstack-poisoned]

ghstack-source-id: bf9b7c2 Pull Request resolved: #89532

… of bases, use view replay, fix non-tensor input handling" This PR is a pretty large refactor of the AOT Autograd logic, to clean things up + fix a few more broken edge cases. The changes are roughly:   (1) (largest change) - for outputs of the fw that alias in some way, we used to *not* return them in the fw graph, and instead return a long tuple of ints, corresponding to the metadata of the outputs’ sizes/strides/storage_offset. The wrapper around the CompiledFunction.forward() would then figure out how to regenerate the alias with a big .as_strided call, indexing into the giant tuple of ints from the fwd graph to get the size/stride metadata. Now instead, the compiled forward graph returns the actual aliased tensor outputs along with every other output, and the wrapper uses that output to regenerate the alias.  Even though the aliased outputs are now returned in the compiled graph (and in the `CompiledFunction.forward()`), I explicitly removed them from the backward graph. That felt like the right call (the aliases shouldn’t participate in the compiled backward, because we don’t actually care about them - we just use them to regenerate the “real” aliases in the forward, but I’m open to other ideas. Doing this required the following:  (a) I updated the `CompiledFunction.forward()` to wrap all aliased outputs into an opaque `TensorAlias` wrapper object, so that the `autograd.Function` will know to not assign gradients to them (b) I updated `CompiledFunction.backward()` to filter out the grad_outputs that correspond to aliased outputs (which I assert are all None) (c) Before tracing the `joint_forward_backward()`, I update `tangents` to filter out tangents corresponding to aliased outputs (3) Cleaned up the metadata and removed some redundant info. Take a look at the `ViewAndMutationMetadata` class (4) Precompute more things so that the hot path code should be faster. For example, when applying mutations back to mutated inputs, we used to loop through all inputs. Now, we precompute the indices of inputs that need to be mutated and only loop through those. This should be a meaningful speedup, since many models get graphs with 200+ inputs, and only a handful need mutations (5) Added support for graphs with outputs that alias intermediates. This should fix a bug that has shown up on multiple models in the benchmark suite, where a graph returns an output that aliases an intermediate, and later tries to mutate that output (there’s are a few gh issues for this that I tried to find but couldn’t) The way I handled this is that I check the `._base` attribute of every output of the forward. Any `._base`’s that don’t already exist as other outputs are then added as extra outputs to the graph They also get their own metadata slots in `ViewAndMutationMetadata.output_info` (which is not strictly necessary, but made handling them easier). I then also tag every output with a `._base` as having `OutputType.alias_of_intermediate`. In the wrapper around the `CompiledFunction.forward()`, for every output that is an alias of an intermediate, I discard that output and regenerate it off its intermediate. (6) We now use Alban’s view-replay logic, instead of always doing `.as_strided()`. This is notably best effort, and still falls back to as_strided() in many cases. In particular: in the synthetic base tests, the aliased inputs are created in eager mode, so they are forced to always replay with .as_strided(). (7) Fixed non-tensor input handling. This was also breaking an internal test (cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire aazzolini). I confirmed the invariant we have inside of `aot_dispatch_deduplicated_autograd` is that we are given a flattened list of inputs (flattened by pytrees), but we are *not* guaranteed that the inputs are tensor-only. (8) I think I responded to and fixed any other relevant PR feedback from the original PR. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

ghstack-source-id: 8bf5d21 Pull Request resolved: #89532

ezyang · 2023-01-11T19:00:13Z

@pytorchbot merge

pytorchmergebot · 2023-01-11T19:01:57Z

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

EasyCLA

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

… of bases, use view replay, fix non-tensor input handling" This PR is a pretty large refactor of the AOT Autograd logic, to clean things up + fix a few more broken edge cases. The changes are roughly:   (1) (largest change) - for outputs of the fw that alias in some way, we used to *not* return them in the fw graph, and instead return a long tuple of ints, corresponding to the metadata of the outputs’ sizes/strides/storage_offset. The wrapper around the CompiledFunction.forward() would then figure out how to regenerate the alias with a big .as_strided call, indexing into the giant tuple of ints from the fwd graph to get the size/stride metadata. Now instead, the compiled forward graph returns the actual aliased tensor outputs along with every other output, and the wrapper uses that output to regenerate the alias.  Even though the aliased outputs are now returned in the compiled graph (and in the `CompiledFunction.forward()`), I explicitly removed them from the backward graph. That felt like the right call (the aliases shouldn’t participate in the compiled backward, because we don’t actually care about them - we just use them to regenerate the “real” aliases in the forward, but I’m open to other ideas. Doing this required the following:  (a) I updated the `CompiledFunction.forward()` to wrap all aliased outputs into an opaque `TensorAlias` wrapper object, so that the `autograd.Function` will know to not assign gradients to them (b) I updated `CompiledFunction.backward()` to filter out the grad_outputs that correspond to aliased outputs (which I assert are all None) (c) Before tracing the `joint_forward_backward()`, I update `tangents` to filter out tangents corresponding to aliased outputs (3) Cleaned up the metadata and removed some redundant info. Take a look at the `ViewAndMutationMetadata` class (4) Precompute more things so that the hot path code should be faster. For example, when applying mutations back to mutated inputs, we used to loop through all inputs. Now, we precompute the indices of inputs that need to be mutated and only loop through those. This should be a meaningful speedup, since many models get graphs with 200+ inputs, and only a handful need mutations (5) Added support for graphs with outputs that alias intermediates. This should fix a bug that has shown up on multiple models in the benchmark suite, where a graph returns an output that aliases an intermediate, and later tries to mutate that output (there’s are a few gh issues for this that I tried to find but couldn’t) The way I handled this is that I check the `._base` attribute of every output of the forward. Any `._base`’s that don’t already exist as other outputs are then added as extra outputs to the graph They also get their own metadata slots in `ViewAndMutationMetadata.output_info` (which is not strictly necessary, but made handling them easier). I then also tag every output with a `._base` as having `OutputType.alias_of_intermediate`. In the wrapper around the `CompiledFunction.forward()`, for every output that is an alias of an intermediate, I discard that output and regenerate it off its intermediate. (6) We now use Alban’s view-replay logic, instead of always doing `.as_strided()`. This is notably best effort, and still falls back to as_strided() in many cases. In particular: in the synthetic base tests, the aliased inputs are created in eager mode, so they are forced to always replay with .as_strided(). (7) Fixed non-tensor input handling. This was also breaking an internal test (cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire aazzolini). I confirmed the invariant we have inside of `aot_dispatch_deduplicated_autograd` is that we are given a flattened list of inputs (flattened by pytrees), but we are *not* guaranteed that the inputs are tensor-only. (8) I think I responded to and fixed any other relevant PR feedback from the original PR. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

ghstack-source-id: df1b152 Pull Request resolved: #89532

ezyang · 2023-01-12T13:51:17Z

@pytorchbot merge

pytorchmergebot · 2023-01-12T13:53:00Z

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

EasyCLA

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

ezyang · 2023-01-12T14:18:29Z

@pytorchbot merge

pytorchmergebot · 2023-01-12T14:20:55Z

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

EasyCLA

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

ezyang · 2023-01-12T14:28:48Z

@pytorchbot merge

pytorchmergebot · 2023-01-12T14:30:32Z

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

EasyCLA

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

ezyang · 2023-01-12T14:30:40Z

@pytorchbot merge

pytorchmergebot · 2023-01-12T14:32:58Z

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

EasyCLA

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

ezyang · 2023-01-12T14:33:09Z

@pytorchbot merge

pytorchmergebot · 2023-01-12T14:34:58Z

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

EasyCLA

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

… bases, use view replay, fix non-tensor input handling" Original PR: #89532 [ghstack-poisoned]

… bases, use view replay, fix non-tensor input handling" Original PR: #89532 ghstack-source-id: df1b152 Pull Request resolved: #92076

ezyang · 2023-01-12T14:44:49Z

CLA bot seems broken, new PR for landing at #92076

ezyang · 2023-01-12T15:11:21Z

/easycla

… bases, use view replay, fix non-tensor input handling" (#92076) Original PR: #89532 Pull Request resolved: #92076 Approved by: https://github.com/janeyx99, https://github.com/albanD

github-actions · 2023-03-13T17:33:55Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

aot autograd: handle intermediate views of bases

033b7c6

[ghstack-poisoned]

bdhirsh requested review from Chillee and ezyang as code owners November 22, 2022 23:12

bdhirsh mentioned this pull request Nov 22, 2022

first draft of input mutation handling for aot autograd #88817

Closed

github-actions bot requested review from SherlockNoMad, albanD, anjali411, antoniojkim, miladm, voznesenskym and wconstab November 22, 2022 23:12

bdhirsh added a commit that referenced this pull request Nov 22, 2022

aot autograd: handle intermediate views of bases

b127f3b

ghstack-source-id: a83fedf Pull Request resolved: #89532

Update on "aot autograd: handle intermediate views of bases"

c47c062

[ghstack-poisoned]

bdhirsh added a commit that referenced this pull request Nov 23, 2022

aot autograd: handle intermediate views of bases

b713900

ghstack-source-id: b05f6fe Pull Request resolved: #89532

bdhirsh mentioned this pull request Nov 23, 2022

fixes for inductor <> batch norm #89603

Closed

Update on "aot autograd: handle intermediate views of bases"

87be797

[ghstack-poisoned]

bdhirsh added a commit that referenced this pull request Nov 23, 2022

aot autograd: handle intermediate views of bases

7ae6230

ghstack-source-id: 750a341 Pull Request resolved: #89532

anjali411 removed their request for review November 28, 2022 15:06

Update on "aot autograd: handle intermediate views of bases"

8c68e2e

[ghstack-poisoned]

bdhirsh mentioned this pull request Nov 28, 2022

don't run input mutation analysis in dynamo #89760

Closed

Update on "aot autograd: handle intermediate views of bases"

f02e7be

[ghstack-poisoned]

bdhirsh removed request for Chillee, SherlockNoMad, albanD, antoniojkim, ezyang, miladm and voznesenskym November 28, 2022 16:38

ezyang added a commit that referenced this pull request Jan 11, 2023

aot autograd: handle intermediate views of bases

1d8a6f7

ghstack-source-id: bf9b7c2 Pull Request resolved: #89532

albanD approved these changes Jan 11, 2023

View reviewed changes

ezyang added the keep-going Don't stop on first failure, keep running tests until the end label Jan 11, 2023

ezyang added a commit that referenced this pull request Jan 11, 2023

aot autograd: handle intermediate views of bases

edf5b16

ghstack-source-id: 8bf5d21 Pull Request resolved: #89532

github-actions bot added the module: inductor label Jan 11, 2023

ezyang added a commit that referenced this pull request Jan 12, 2023

aot autograd: handle intermediate views of bases

56d2274

ghstack-source-id: df1b152 Pull Request resolved: #89532

ezyang mentioned this pull request Jan 12, 2023

Reland "AOT Autograd refactor + cleanup, handle intermediate views of bases, use view replay, fix non-tensor input handling" #92076

Closed

ezyang added a commit that referenced this pull request Jan 12, 2023

Reland "AOT Autograd refactor + cleanup, handle intermediate views of…

fbbf34e

… bases, use view replay, fix non-tensor input handling" Original PR: #89532 [ghstack-poisoned]

ezyang added a commit that referenced this pull request Jan 12, 2023

Reland "AOT Autograd refactor + cleanup, handle intermediate views of…

27c531b

… bases, use view replay, fix non-tensor input handling" Original PR: #89532 ghstack-source-id: df1b152 Pull Request resolved: #92076

github-actions bot added the Stale label Mar 13, 2023

github-actions bot closed this Apr 12, 2023

facebook-github-bot deleted the gh/bdhirsh/346/head branch June 8, 2023 15:44

AOT Autograd refactor + cleanup, handle intermediate views of bases, use view replay, fix non-tensor input handling #89532

AOT Autograd refactor + cleanup, handle intermediate views of bases, use view replay, fix non-tensor input handling #89532

Uh oh!

Conversation

bdhirsh commented Nov 22, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89532

✅ No Failures

Uh oh!

ezyang commented Jan 11, 2023

Uh oh!

pytorchmergebot commented Jan 11, 2023

Merge failed

Uh oh!

ezyang commented Jan 12, 2023

Uh oh!

pytorchmergebot commented Jan 12, 2023

Merge failed

Uh oh!

ezyang commented Jan 12, 2023

Uh oh!

pytorchmergebot commented Jan 12, 2023

Merge failed

Uh oh!

ezyang commented Jan 12, 2023

Uh oh!

pytorchmergebot commented Jan 12, 2023

Merge failed

Uh oh!

ezyang commented Jan 12, 2023

Uh oh!

pytorchmergebot commented Jan 12, 2023

Merge failed

Uh oh!

ezyang commented Jan 12, 2023

Uh oh!

pytorchmergebot commented Jan 12, 2023

Merge failed

Uh oh!

ezyang commented Jan 12, 2023

Uh oh!

ezyang commented Jan 12, 2023

Uh oh!

github-actions bot commented Mar 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bdhirsh commented Nov 22, 2022 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 22, 2022 •

edited

Loading