fix DebugInterpreter, use it + functionalization stride debugger unconditionally in aot_eager backend #91038

bdhirsh · 2022-12-16T21:29:56Z

The aot_eager backend now performs to types of stride checks automatically:
(1) at compile time, when running functioanlization we perform stride checks
(2) at runtime, when executing the graph we perform stride checks

Adding these debug asserts so that they always run in the aot_eager backend has a small downside: at runtime, the backend will technically be slower than eager mode.

This tentatively seems ok: the main purpose of this backend is for debugging purposes anyway.

This also would have caught a silent correctness error automatically, that was fixed at #91029

Stack from ghstack (oldest at bottom):

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

…nditionally in aot_eager backend [ghstack-poisoned]

pytorch-bot · 2022-12-16T21:29:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91038

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 17 Failures

As of commit 26d27d0:

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base 9d8fa78:

cuda11.6-py3.10-gcc7-sm86 / test (inductor_timm, 2, 2, linux.g5.4xlarge.nvidia.gpu)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…nditionally in aot_eager backend ghstack-source-id: 37401ad Pull Request resolved: #91038

ezyang · 2022-12-16T21:33:28Z

torch/_functorch/compilers.py

    def run(self, *args):
-        self.symbol_mapping = bind_symbols(self.module, *args)
-        super().run(*args)
+        if hasattr(self.module, "shape_env"):


oopsie, thanks

ezyang · 2022-12-16T21:34:06Z

torch/_dynamo/optimizations/training.py

            # NB: NOT cloned!
-            with enable_aot_logging():
+            with enable_aot_logging(), torch._dispatch.python.enable_crossref_functionalize(
+                crossref_functionalize


Would it be more logical to push this into AOT Autograd itself?

That's fair. Actually, what do you think of unconditionally running it in aot autograd? These extra checks probably won't be our bottleneck for compile time.

I'm mostly worried about the robustness of crossref functionalize, not sure I got the logic entirely right. If we can prove it out I'm amenable.

bdhirsh · 2022-12-19T15:29:01Z

Gonna put this PR aside for now to focus on my other outstanding PR's. I spent a bit of time debugging why the functionalization stride checks don't play well when turned on for aot_eager, but I haven't finished (I bet it's something dumb).

Notes for me: What I see is that some point when running a decomp underneath fake tensor, we end up calling empty_strided() and dispatching to BackendSelect, instead of re-entrantly dispatching to FakeTensor again.

ezyang · 2022-12-19T23:42:59Z

The last time this happened to me, it's because there was a device= argument and fake tensor hadn't taken care of it (converting it to meta)

github-actions · 2023-02-18T12:42:19Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

ezyang · 2023-02-18T17:53:17Z

This one... is still relevant I think?

fix DebugInterpreter, use it + functionalization stride debugger unco…

26d27d0

…nditionally in aot_eager backend [ghstack-poisoned]

bdhirsh mentioned this pull request Dec 16, 2022

fix default partitioner: save sizes instead of tensor for backward when possible #91012

Closed

bdhirsh added a commit that referenced this pull request Dec 16, 2022

fix DebugInterpreter, use it + functionalization stride debugger unco…

3959b03

…nditionally in aot_eager backend ghstack-source-id: 37401ad Pull Request resolved: #91038

github-actions bot requested review from Chillee, SherlockNoMad, albanD, antoniojkim, ezyang, jbschlosser, miladm, voznesenskym and wconstab December 16, 2022 21:30

github-actions bot added ciflow/inductor module: dynamo labels Dec 16, 2022

ezyang reviewed Dec 16, 2022

View reviewed changes

ezyang approved these changes Dec 16, 2022

View reviewed changes

bdhirsh added the release notes: dynamo label Dec 16, 2022

albanD removed their request for review December 20, 2022 12:23

bdhirsh mentioned this pull request Dec 20, 2022

AOT Autograd refactor + cleanup, handle intermediate views of bases, use view replay, fix non-tensor input handling #89532

Closed

github-actions bot added the Stale label Feb 18, 2023

github-actions bot closed this Mar 20, 2023

facebook-github-bot deleted the gh/bdhirsh/363/head branch June 8, 2023 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix DebugInterpreter, use it + functionalization stride debugger unconditionally in aot_eager backend #91038

fix DebugInterpreter, use it + functionalization stride debugger unconditionally in aot_eager backend #91038

Uh oh!

bdhirsh commented Dec 16, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 16, 2022 •

edited

Loading

Uh oh!

ezyang Dec 16, 2022

Uh oh!

ezyang Dec 16, 2022

Uh oh!

bdhirsh Dec 16, 2022

Uh oh!

ezyang Dec 16, 2022

Uh oh!

bdhirsh commented Dec 19, 2022

Uh oh!

ezyang commented Dec 19, 2022

Uh oh!

github-actions bot commented Feb 18, 2023

Uh oh!

ezyang commented Feb 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix DebugInterpreter, use it + functionalization stride debugger unconditionally in aot_eager backend #91038

fix DebugInterpreter, use it + functionalization stride debugger unconditionally in aot_eager backend #91038

Uh oh!

Conversation

bdhirsh commented Dec 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91038

❌ 17 Failures

Uh oh!

ezyang Dec 16, 2022

Choose a reason for hiding this comment

Uh oh!

ezyang Dec 16, 2022

Choose a reason for hiding this comment

Uh oh!

bdhirsh Dec 16, 2022

Choose a reason for hiding this comment

Uh oh!

ezyang Dec 16, 2022

Choose a reason for hiding this comment

Uh oh!

bdhirsh commented Dec 19, 2022

Uh oh!

ezyang commented Dec 19, 2022

Uh oh!

github-actions bot commented Feb 18, 2023

Uh oh!

ezyang commented Feb 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bdhirsh commented Dec 16, 2022 •

edited

Loading

pytorch-bot bot commented Dec 16, 2022 •

edited

Loading