Skip to content

Conversation

@bdhirsh
Copy link
Contributor

@bdhirsh bdhirsh commented Dec 16, 2022

The aot_eager backend now performs to types of stride checks automatically:
(1) at compile time, when running functioanlization we perform stride checks
(2) at runtime, when executing the graph we perform stride checks

Adding these debug asserts so that they always run in the aot_eager backend has a small downside: at runtime, the backend will technically be slower than eager mode.

This tentatively seems ok: the main purpose of this backend is for debugging purposes anyway.

This also would have caught a silent correctness error automatically, that was fixed at #91029

Stack from ghstack (oldest at bottom):

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

…nditionally in aot_eager backend

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 16, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91038

Note: Links to docs will display an error until the docs builds have been completed.

❌ 17 Failures

As of commit 26d27d0:

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base 9d8fa78:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

bdhirsh added a commit that referenced this pull request Dec 16, 2022
…nditionally in aot_eager backend

ghstack-source-id: 37401ad
Pull Request resolved: #91038
def run(self, *args):
self.symbol_mapping = bind_symbols(self.module, *args)
super().run(*args)
if hasattr(self.module, "shape_env"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oopsie, thanks

# NB: NOT cloned!
with enable_aot_logging():
with enable_aot_logging(), torch._dispatch.python.enable_crossref_functionalize(
crossref_functionalize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more logical to push this into AOT Autograd itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. Actually, what do you think of unconditionally running it in aot autograd? These extra checks probably won't be our bottleneck for compile time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly worried about the robustness of crossref functionalize, not sure I got the logic entirely right. If we can prove it out I'm amenable.

@bdhirsh
Copy link
Contributor Author

bdhirsh commented Dec 19, 2022

Gonna put this PR aside for now to focus on my other outstanding PR's. I spent a bit of time debugging why the functionalization stride checks don't play well when turned on for aot_eager, but I haven't finished (I bet it's something dumb).

Notes for me: What I see is that some point when running a decomp underneath fake tensor, we end up calling empty_strided() and dispatching to BackendSelect, instead of re-entrantly dispatching to FakeTensor again.

@ezyang
Copy link
Contributor

ezyang commented Dec 19, 2022

The last time this happened to me, it's because there was a device= argument and fake tensor hadn't taken care of it (converting it to meta)

@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Feb 18, 2023
@ezyang
Copy link
Contributor

ezyang commented Feb 18, 2023

This one... is still relevant I think?

@github-actions github-actions bot closed this Mar 20, 2023
@facebook-github-bot facebook-github-bot deleted the gh/bdhirsh/363/head branch June 8, 2023 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants