Pass dynamo's fake_mode down to aot_autograd, remove duplicate fake tensor conversion, install aot guards in dynamo #88546

voznesenskym · 2022-11-05T21:48:52Z

Step 2 of https://docs.google.com/document/u/1/d/1QJ-M4zfMkD-fjHIqW089RptjLl9EgozZGCceUbvmgfY/edit
Step 1 can be found here: #87570

The problem this PR solves is that today, we have a world wherein dynamo creates guards and installs them, but aot_autograd creates guards... and does not install them. This means that code executed in make_fx can produce new symbolic shape guards, that then do not get used anywhere.

The order today, before this PR, is:

Dynamo starts interpresting a frame
Dynamo call call_user_compiler which sets up a function which will lazily invoke create_aot_dispatcher_function at runtime
Dynamo installs its own guards, pulls in shape_env guards only from dynamo shape_env, installs those as well
Dynamo finishes the frame
The func made in (2) is invoke, and create_aot_dispatcher_function calls make_fx which can create shape env guards, these guards are not used anywhere

Our solution to this is to ensure that make_fx's guards bubble up to dynamo, in a "unified cache", and we do this by piping a fake_mode down to aot_autograd from dynamo.

The unified cache architecture works by changing the lifecycle of when we compile the the aot_autograd function from runtime, to lowering time. We go from lazily compiling create_aot_dispatcher_function to always invoking it at lowering time. This is sound because the compiled_fn is protected by dynamo's guards. The order, therefore, is now:

Dynamo starts interpresting a frame
Dynamo call call_user_compiler which invokes compile_fn which invokes create_aot_dispatcher_function
create_aot_dispatcher_function calls make_fx which can create shape env guards
Dynamo installs its own guards, pulls in shape_env guards, installs those as well
Dynamo finishes the frame

As an added bonus, this allows us to remove the duplicate fake tensor conversion code that used to always be invoked in create_aot_dispatcher_function through process_inputs, reusing the fake tensors from dynamo's conversion step. Outside of dynamo's invocation path, we still need this layer to exist, though, as aot_function is a public entry point to aot_autograd, so the process_inputs/fake tensor conversion code gets lifted up to there. This also changes the story of create_aot_dispatcher_function - specifically, in that it now MUST be called with fake tensors if we are in fake tensor mode, as it will do no fake tensor conversion of its own. For Dynamo, we rely on the dynamo passing that down, for the public entry point, we rely on aot_function.

One annoying sort of stopgap around this is the parameter story. Dynamo fake-ifys only its inputs, and params get treated a little differently. This is made more confusing by the fact that while create_aot_dispatcher_function takes all fake tensors (params and inputs both), aot_function_simplified still needs real parameters as it uses them as inputs to the compiled aot_function. This means we need to, later on, either fake-ify parameters at dynamo time and pass them alongside the real ones, or to keep the param-only fake tensor conversion where it is in this pr.

cc @mlazos @soumith @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

wip wip hax Fixes Wip hacks and progress

…/aot_autograd_plumb

pytorch-bot · 2022-11-05T21:48:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88546

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 34 Failures

As of commit 74276f1:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/_dynamo/optimizations/backends.py

torch/_decomp/decompositions.py

torch/_dynamo/guards.py

…/aot_autograd_plumb

torch/_meta_registrations.py

torch/fx/experimental/symbolic_shapes.py

functorch/_src/aot_autograd.py

torch/_dynamo/variables/lists.py

…/aot_autograd_plumb

bdhirsh · 2022-11-16T22:30:03Z

torch/_dynamo/optimizations/analysis.py

+        assert fake_mode, "Fake mode must be passed in"
        new_gm = deepcopy_to_fake_tensor(gm, fake_mode)
-        with fake_mode, enable_python_dispatcher():
+        with enable_python_dispatcher():


So FakeTensorMode is no longer enabled when this pass runs? The bad situation I'm imagining could happen is:

(1) The graph contains a factory function, so interpreting the graph ends up creating a real tensor with real sizes (bad because we do unnecessary compute, etc)

(2) If the graph contains a SymInt-related op, like s1.__floordiv__(s2), and we grab the real sizes off of our real tensor, then we'll break on this pass when we run that op.

The original idea behind me adding that temporary extra FakeTensorMode and using it here was that:
(1) We only want to create fake tensors and symbolic shapes when running this pass
(2) Using the existing ShapeEnv in dynamo might be bad, because we could end up installing a bunch of redundant guards in it when we run this pass. It sounds like this was the wrong thing to do though, since mixing tensors across multiple FakeTensorModes causes issues.

That was just my brain dump understanding. If we're going to try to remove this pass soon anyway though, and this change fixed existing issues in the mean time, then I'm for landing it

We can keep fake_mode here, we need it for the deepcopy anyway, I just thought it was spurious.

…/aot_autograd_plumb

Chillee

Only carefully reviewed the AOT parts, but mostly LGTM.

Chillee · 2022-11-17T09:51:20Z

functorch/_src/aot_autograd.py

-    flat_fn, flat_args: List[Tensor], aot_config: AOTConfig
+def _create_aot_dispatcher_function(
+    flat_fn, fake_flat_tensor_args: List[Tensor], aot_config: AOTConfig, fake_mode,
 ):


Probably worth an assertion here

Chillee · 2022-11-17T09:52:08Z

functorch/_src/aot_autograd.py


+
+    fake_mode = None
+    if "fake_mode" in top_kwargs and config.use_fake_tensor:


At this point it's worth duplicating the necessary args/kwargs to this function.

Ah yeah, another pr tho, I think?

hmm nah ill do it now

Chillee · 2022-11-17T09:52:38Z

functorch/_src/aot_autograd.py

+                fake_mode,
+            )
+
+        compiled_fn = compile(fn, *fake_flat_tensor_args, *inputs)


Why even have a wrapper?

Chillee · 2022-11-17T09:53:45Z

functorch/_src/aot_autograd.py

        return out

+
    def aot_function_simplified(


imo just delete aot_function_simplified and directly call _create_aot_dispatcher_function. The more layers of indirection we can remove the better :)

This contains #88546 plus a big pile of inductor hacks. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

This contains #88546 plus a big pile of inductor hacks. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 4507db5 Pull Request resolved: #89313

…ductor" This contains #88546 plus a big pile of inductor hacks. ``` $ TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/dynamo/torchbench.py --accuracy --backend inductor --training --only BERT_pytorch cuda train BERT_pytorch PASS ``` Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

…ductor" This contains #88546 plus a big pile of inductor hacks. ``` $ TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/dynamo/torchbench.py --accuracy --backend inductor --training --only BERT_pytorch cuda train BERT_pytorch PASS ``` I don't know if we're actually generating generic kernels though. Maybe Chillee can check. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

This contains #88546 plus a big pile of inductor hacks. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: f052618 Pull Request resolved: #89313

…ductor" This contains #88546 plus a big pile of inductor hacks. ``` $ TORCHDYNAMO_DYNAMIC_SHAPES=1 AOT_DYNAMIC_SHAPES=1 python benchmarks/dynamo/torchbench.py --accuracy --backend inductor --training --only BERT_pytorch cuda train BERT_pytorch PASS ``` I don't know if we're actually generating generic kernels though. Maybe Chillee can check. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

This contains #88546 plus a big pile of inductor hacks. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: baff304 Pull Request resolved: #89313

…ensor conversion, install aot guards in dynamo (#88546)

voznesenskym added 5 commits November 5, 2022 19:23

wip

6c4473d

wip wip hax Fixes Wip hacks and progress

Lints, fixes, nits

c293a51

Fix misnamed elements

6417a50

Merge branch 'symbolic-shapes' of github.com:pytorch/pytorch into voz…

623b9e4

…/aot_autograd_plumb

Lint

8285ac3

pytorch-bot bot added the release notes: fx release notes category label Nov 5, 2022

github-actions bot added ciflow/inductor module: dynamo labels Nov 5, 2022

voznesenskym changed the title ~~[Prototype] [RFC] Shape_env to AOTAutograd, guards, caches, forward pass invalidation~~ [Prototype] [RFC] [WIP] Shape_env to AOTAutograd, guards, caches, forward pass invalidation Nov 5, 2022

ezyang reviewed Nov 6, 2022

View reviewed changes

torch/_dynamo/optimizations/backends.py Outdated Show resolved Hide resolved

ezyang reviewed Nov 7, 2022

View reviewed changes

torch/_decomp/decompositions.py Outdated Show resolved Hide resolved

ezyang reviewed Nov 8, 2022

View reviewed changes

torch/_dynamo/guards.py Show resolved Hide resolved

voznesenskym added 7 commits November 11, 2022 01:06

wip

3b17a09

Merge branch 'symbolic-shapes' of github.com:pytorch/pytorch into voz…

417d01a

…/aot_autograd_plumb

Fixes, cleanups

2dc622e

Fixes, cleanups

252a747

Merge branch 'symbolic-shapes' of github.com:pytorch/pytorch into voz…

77c3528

…/aot_autograd_plumb

wip

4fe82ea

wip

82e18d3

github-actions bot added the module: inductor label Nov 13, 2022