-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Properly resolve source_ref when constructing shape guards #91058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91058
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 FailuresAs of commit f63c0ef: NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't. For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter. One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't. For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter. One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't. For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter. One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't. For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter. One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
| def is_input_source(source): | ||
| return source.guard_source() in [ | ||
| GuardSource.LOCAL, | ||
| GuardSource.GLOBAL, | ||
| GuardSource.LOCAL_NN_MODULE, | ||
| GuardSource.GLOBAL_NN_MODULE, | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used? What is an input source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, dead code. (FWIW, this used to be the set of sources for which select() worked, but I ended up expanding this set.)
| def __init__( | ||
| self, | ||
| id_ref: Callable[[Type[object]], str], | ||
| source_ref: Callable[[Source], str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is a source_ref? I ask because this gets a little confusing... Source has a make_guard on it - and now GuardBuilder takes a source also... this leads to an even more complex interrelationship between Source, Guard, GaurdBuilder and all the systems that bridge them. It may be worth taking a step back and making clearer delineation between guard accumulation and guard installation.
If we are just using this for source_ref->source->builder->arg_ref->globals OR locals->eval(), we may be better off just simplifying simplifying this a ton by just passing a scope into GuardBuilder.
That will also help avoid your cycle... and keep GuardBuilder free of source notions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why, it's id_ref or arg_ref, but you pass it a source!
The _ref series of functions are invoked when processing guards, and indicate that we made use of an id/arg/source in code generating sources. For ids, this induces the builder to attach a weakref finalizer to the object to invalidate the guard if the object dies (preventing ABA problems with the ids). For args/sources, it induces the builder to ensure the local is actually bound in largs. In principle, these could do anything, it's mostly a contract "if you use this arg, you must use arg_ref"
While there is a cycle here, it is not caused by Source. Source is a relatively dumb data structure (indeed, it could be moved to torch._guards with very little hoohah) and it is fine to introduce a dependency on Source. The cycle here results from source_ref itself needing to close over BOTH local and global GuardBuilders. (If it only closed over one GuardBuilder, it could just be a normal method, no muss no fuss.) But why does it need to close over both? Well, the problem is that in the traditional LocalBuilder/GlobalBuilder convention, a guard is supposed to ONLY access a local, or a global. But ShapeEnv is genre defying: it accesses both locals AND globals, all in one go. And indeed, this is irreducible, because you might be doing an equality test between a size from a local tensor, and another size from a global tensor.
So, when this happened to me, I thought, "well, OK, what if I just make a new ShapeEnvBuilder that's dedicated for ShapeEnv? But this does not work either, because LocalBuilder and GlobalBuilder are the source of truth for... all the locals/globals that are accessed (e.g., via conventional arg_ref). If you shove the argument references into a third builder you still have to make sure they get ferried to the approach use sites on the local builder and global builder. This seemed bad, so I opted for the smallest change that got things to work, which involved source_ref so that I could make sure I actually popped the data into the correct builder.
What I kind of want is to not have a separate local/global builder and just have some unified builder, but I'm not really sure how it would work, and it seemed like a big refactor for not enough payoff? But feel free to disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind it, notionally, actually - but I feel like the ShapeEnvBuilder model (perhaps spawned off a UnifiedBuilder?) is more cohesive.
And indeed, this is irreducible,
That is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with ShapeEnvBuilder is I have to do another refactor to extract the tracked args from LocalBuilder to live somewhere where ShapeEnvBuilder can add to them too. I mean I suppose I can do it, lmk what you want
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't. For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter. One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't. For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter. One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
|
@pytorchbot merge -f "master ci flakiness" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command Details for Dev Infra teamRaised by workflow job |
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't. For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter. One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure. Signed-off-by: Edward Z. Yang <ezyangfb.com> cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: The following mandatory check(s) failed (Rule Dig deeper by viewing the failures on hud Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -f "disabled test only" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't.
For the most part, this didn't matter in practice, because we usually had a
TENSOR_MATCHguard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter.One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure.
Signed-off-by: Edward Z. Yang ezyang@fb.com
cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire