Fix DDPOptimizer fake_mode execution #92986

wconstab · 2023-01-25T18:00:40Z

Stack from ghstack (oldest at bottom):

-> Fix DDPOptimizer fake_mode execution #92986

When running compiled submods for the purpose of producing outputs to pass
to the compilation step for the next submod, we use fake parameters and
assume fake inputs, but we forgot to activate our fake_mode during execution.

This caused certain edge cases where tensors other than activations or parameters
got created during execution, such as scalar->tensor expansion in the case
of executing torch.where(tensor, scalar, scalar).

Also add a test and clarify behavior of DDPOptimizer via comments.

Fixes #92941
cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

When running compiled submods for the purpose of producing outputs to pass to the compilation step for the next submod, we use fake parameters and assume fake inputs, but we forgot to activate our fake_mode during execution. This caused certain edge cases where tensors other than activations or parameters got created during execution, such as scalar->tensor expansion in the case of executing torch.where(tensor, scalar, scalar). Also add a test and clarify behavior of DDPOptimizer via comments. [ghstack-poisoned]

pytorch-bot · 2023-01-25T18:00:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92986

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit cac8faa:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

When running compiled submods for the purpose of producing outputs to pass to the compilation step for the next submod, we use fake parameters and assume fake inputs, but we forgot to activate our fake_mode during execution. This caused certain edge cases where tensors other than activations or parameters got created during execution, such as scalar->tensor expansion in the case of executing torch.where(tensor, scalar, scalar). Also add a test and clarify behavior of DDPOptimizer via comments. ghstack-source-id: ce20aa8 Pull Request resolved: #92986

wconstab · 2023-01-25T18:09:44Z

@voznesenskym Semi-related to this PR, is the code that infers a FakeMode from input tensors or else creates a new one (https://github.com/pytorch/pytorch/blob/master/torch/_dynamo/optimizations/distributed.py#L148) good enough going forward, or do we want to align this FakeMode better with one that lives in dynamo? I'm not sure what the current design is.

voznesenskym · 2023-01-25T18:58:29Z

torch/_dynamo/optimizations/distributed.py

+                        # Finally, we have to produce inputs for use compiling the next submodule,
+                        # and these need to be FakeTensors, so we execute the module under fake_mode
+                        with fake_mode:
+                            return curr_submod(*new_args, **kwargs)


This code is executed at runtime, right?

not sure what you mean.

Since torch dynamo defers its compilation until the first execution, then in a way yes, this code happens "at runtime".

But this code only happens as a part of the compilation flow, which in a simple (static model) scenario only happens once. The second time a user calls their compiled ddp model, none of this code should run, since we're not recompiling.

maybe you confused it with 'WrapperModule.forwrad' - that's the only piece of code in the whole `ddp_optimizer' file that I'd expect to run repeatedly on every runtime. (all it does is unwrap the tuple output from the compiled subgraph)

yeah, wrappermodule.forward is the place I was thinking of.

This looks fine to me.

When Ed and I were working on it - it was very confusing which part of this was compile time, and which was runtime.

voznesenskym · 2023-01-25T19:21:30Z

@voznesenskym Semi-related to this PR, is the code that infers a FakeMode from input tensors or else creates a new one (https://github.com/pytorch/pytorch/blob/master/torch/_dynamo/optimizations/distributed.py#L148) good enough going forward, or do we want to align this FakeMode better with one that lives in dynamo? I'm not sure what the current design is.

fake_mode = fake_mode_from_tensors(example_inputs)

Is the right way to do things :) I added it for this purpose. The idea is that you do a best effort to get the current fake mode, and if there isn't one, you can make it. There's a few useful comments in that definition, one should say something along the lines of maybe having it always provide a fake_mode...

wconstab · 2023-01-25T19:22:13Z

@pytorchbot merge

pytorchmergebot · 2023-01-25T19:24:02Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

wconstab requested review from H-Huang, awgu, kwen2501, mrshenli, rohan-varma, wanchaol and zhaojuanmao as code owners January 25, 2023 18:00

github-actions bot added ciflow/inductor module: dynamo labels Jan 25, 2023

wconstab requested review from bdhirsh, ezyang, ngimel and voznesenskym and removed request for H-Huang, awgu, kwen2501, mrshenli, rohan-varma, wanchaol and zhaojuanmao January 25, 2023 18:00

wconstab mentioned this pull request Jan 25, 2023

torch.where + DDPoptimizer + Dynamo causes faketensor error #92941

Closed

bdhirsh approved these changes Jan 25, 2023

View reviewed changes

wconstab added the release notes: distributed (ddp) release notes category label Jan 25, 2023

voznesenskym reviewed Jan 25, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 25, 2023

This was referenced Jan 25, 2023

[abandoned] Add native allreduce and handle processgroup as int #92167

Closed

[abandoned] - Debug inductor lowering for allreduce #92735

Closed

Fix FlexibleLayout issue? #93016

Closed

wconstab changed the title ~~Fix DDOptimizer fake_mode execution~~ Fix DDPOptimizer fake_mode execution Jan 25, 2023

pytorchmergebot added the Merged label Jan 26, 2023

pytorchmergebot closed this in 5441f2c Jan 26, 2023

wconstab deleted the gh/wconstab/75/head branch January 26, 2023 01:33

wconstab mentioned this pull request Feb 13, 2023

[Dynamo] Dyanmo with DDP Failed to Turn Tensor Constructor to FakeTensor #94574

Closed

chedatomasz mentioned this pull request Jun 29, 2023

torch.compile + DDP + weight_norm crashes related to FakeTensor handling #104446

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix DDPOptimizer fake_mode execution #92986

Fix DDPOptimizer fake_mode execution #92986

Uh oh!

wconstab commented Jan 25, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 25, 2023 •

edited

Loading

Uh oh!

wconstab commented Jan 25, 2023

Uh oh!

voznesenskym Jan 25, 2023

Uh oh!

wconstab Jan 25, 2023

Uh oh!

wconstab Jan 25, 2023

Uh oh!

voznesenskym Jan 25, 2023

Uh oh!

voznesenskym commented Jan 25, 2023

Uh oh!

wconstab commented Jan 25, 2023

Uh oh!

pytorchmergebot commented Jan 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix DDPOptimizer fake_mode execution #92986

Fix DDPOptimizer fake_mode execution #92986

Uh oh!

Conversation

wconstab commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92986

✅ No Failures

Uh oh!

wconstab commented Jan 25, 2023

Uh oh!

voznesenskym Jan 25, 2023

Choose a reason for hiding this comment

Uh oh!

wconstab Jan 25, 2023

Choose a reason for hiding this comment

Uh oh!

wconstab Jan 25, 2023

Choose a reason for hiding this comment

Uh oh!

voznesenskym Jan 25, 2023

Choose a reason for hiding this comment

Uh oh!

voznesenskym commented Jan 25, 2023

Uh oh!

wconstab commented Jan 25, 2023

Uh oh!

pytorchmergebot commented Jan 25, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wconstab commented Jan 25, 2023 •

edited

Loading

pytorch-bot bot commented Jan 25, 2023 •

edited

Loading