Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Dec 17, 2022

Stack from ghstack (oldest at bottom):

Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't.

For the most part, this didn't matter in practice, because we usually had a TENSOR_MATCH guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter.

One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure.

Signed-off-by: Edward Z. Yang ezyang@fb.com

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 17, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91058

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit f63c0ef:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't.

For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter.

One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't.

For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter.

One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 17, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: bb98c25
Pull Request resolved: #91058
@ezyang ezyang added ciflow/trunk Trigger trunk jobs on your pull request release notes: dynamo labels Dec 17, 2022
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't.

For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter.

One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire

[ghstack-poisoned]
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't.

For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter.

One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 19, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 600aeb0
Pull Request resolved: #91058
@ezyang ezyang requested a review from jansel December 19, 2022 23:34
Comment on lines +39 to +45
def is_input_source(source):
return source.guard_source() in [
GuardSource.LOCAL,
GuardSource.GLOBAL,
GuardSource.LOCAL_NN_MODULE,
GuardSource.GLOBAL_NN_MODULE,
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used? What is an input source?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, dead code. (FWIW, this used to be the set of sources for which select() worked, but I ended up expanding this set.)

def __init__(
self,
id_ref: Callable[[Type[object]], str],
source_ref: Callable[[Source], str],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a source_ref? I ask because this gets a little confusing... Source has a make_guard on it - and now GuardBuilder takes a source also... this leads to an even more complex interrelationship between Source, Guard, GaurdBuilder and all the systems that bridge them. It may be worth taking a step back and making clearer delineation between guard accumulation and guard installation.

If we are just using this for source_ref->source->builder->arg_ref->globals OR locals->eval(), we may be better off just simplifying simplifying this a ton by just passing a scope into GuardBuilder.

That will also help avoid your cycle... and keep GuardBuilder free of source notions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why, it's id_ref or arg_ref, but you pass it a source!

The _ref series of functions are invoked when processing guards, and indicate that we made use of an id/arg/source in code generating sources. For ids, this induces the builder to attach a weakref finalizer to the object to invalidate the guard if the object dies (preventing ABA problems with the ids). For args/sources, it induces the builder to ensure the local is actually bound in largs. In principle, these could do anything, it's mostly a contract "if you use this arg, you must use arg_ref"

While there is a cycle here, it is not caused by Source. Source is a relatively dumb data structure (indeed, it could be moved to torch._guards with very little hoohah) and it is fine to introduce a dependency on Source. The cycle here results from source_ref itself needing to close over BOTH local and global GuardBuilders. (If it only closed over one GuardBuilder, it could just be a normal method, no muss no fuss.) But why does it need to close over both? Well, the problem is that in the traditional LocalBuilder/GlobalBuilder convention, a guard is supposed to ONLY access a local, or a global. But ShapeEnv is genre defying: it accesses both locals AND globals, all in one go. And indeed, this is irreducible, because you might be doing an equality test between a size from a local tensor, and another size from a global tensor.

So, when this happened to me, I thought, "well, OK, what if I just make a new ShapeEnvBuilder that's dedicated for ShapeEnv? But this does not work either, because LocalBuilder and GlobalBuilder are the source of truth for... all the locals/globals that are accessed (e.g., via conventional arg_ref). If you shove the argument references into a third builder you still have to make sure they get ferried to the approach use sites on the local builder and global builder. This seemed bad, so I opted for the smallest change that got things to work, which involved source_ref so that I could make sure I actually popped the data into the correct builder.

What I kind of want is to not have a separate local/global builder and just have some unified builder, but I'm not really sure how it would work, and it seemed like a big refactor for not enough payoff? But feel free to disagree.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind it, notionally, actually - but I feel like the ShapeEnvBuilder model (perhaps spawned off a UnifiedBuilder?) is more cohesive.

And indeed, this is irreducible,

That is true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with ShapeEnvBuilder is I have to do another refactor to extract the tracked args from LocalBuilder to live somewhere where ShapeEnvBuilder can add to them too. I mean I suppose I can do it, lmk what you want

@albanD albanD removed their request for review December 20, 2022 12:12
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't.

For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter.

One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 27, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 1e2627a
Pull Request resolved: #91058
Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't.

For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter.

One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 28, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 2dac621
Pull Request resolved: #91058
@ezyang
Copy link
Contributor Author

ezyang commented Dec 29, 2022

@pytorchbot merge -f "master ci flakiness"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x 6e64903baac192f968a29e4b425e892e0fd53a84 returned non-zero exit code 1

Auto-merging test/test_proxy_tensor.py
Auto-merging torch/_subclasses/meta_utils.py
Auto-merging torch/fx/experimental/symbolic_shapes.py
CONFLICT (content): Merge conflict in torch/fx/experimental/symbolic_shapes.py
error: could not apply 6e64903baa... Store source, not sname, in Symbol
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
Details for Dev Infra team Raised by workflow job

Whenever you guard on something, you're supposed to tell GuardBuilder about it, so GuardBuilder knows that it has to actually bind it in scope when it creates the guard function. But shape env guards bypass that mechanism completely. Well, now they don't.

For the most part, this didn't matter in practice, because we usually had a `TENSOR_MATCH` guard floating around that made sure that the guard stayed live. But if we ever eliminate those guards (e.g., because we build it into the shape guard directly; something we'll probably want to do when #89707 goes online) then this will indeed matter.

One complication: some of the shape env guards are on globals. You have to make sure to shunt the usage to the correct guard builder in that case. Maybe it would be better if we refactored things so there is only one GuardBuilder. Not sure.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Dec 29, 2022
Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: e9cfb9e
Pull Request resolved: #91058
@ezyang
Copy link
Contributor Author

ezyang commented Dec 29, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

@ezyang
Copy link
Contributor Author

ezyang commented Dec 30, 2022

@pytorchbot merge -f "disabled test only"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/ezyang/1673/head branch June 8, 2023 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants