-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[dtensor][4/N] refactor dispatching logic and add propagator #90733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90733
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 213b785: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
XilunWu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Thx for refactoring op dispatching and sharding propagation facilities. some typos to fix.
| if op_call in _CURRENT_DECOMPOSITION_TABLE: | ||
| return _CURRENT_DECOMPOSITION_TABLE[op_call](*args, **kwargs) | ||
|
|
||
| # STEP 0. See if threre're user defined custom aten operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: threre're
| # implementations. Custom operators take the highest priority | ||
| if custom_dispatch_ops is not None and str(op_call) in custom_dispatch_ops: | ||
| # dispatch to user defined custom distributed tensor ops | ||
| return custom_dispatch_ops[str(op_call)](*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: will this custom_dispatch_ops be deprecated once register_impl is no longer needed? I assume that eventually we want to get rid of register_impl and fully adopt propagation rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should deprecate this once we move all ops to use propagation rules
torch/distributed/_tensor/prop.py
Outdated
| if sharding_prop_func is None: | ||
| # step 1. If there's not even one sharding rule | ||
| # implemented for the operator, we fall back to | ||
| # local tensor compute, this is wront currently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: wront
torch/distributed/_tensor/prop.py
Outdated
| # implemented for the operator, we fall back to | ||
| # local tensor compute, this is wront currently | ||
| # we will change the behavior to reshard to full | ||
| # replicate and do the computatation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: computatation
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. [ghstack-poisoned]
|
@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. Differential Revision: [D42876251](https://our.internmc.facebook.com/intern/diff/D42876251) [ghstack-poisoned]
|
@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. Differential Revision: [D42876251](https://our.internmc.facebook.com/intern/diff/D42876251) [ghstack-poisoned]
This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. Differential Revision: [D42876251](https://our.internmc.facebook.com/intern/diff/D42876251) [ghstack-poisoned]
fduwjj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
This PR refactors the dispatching logic to make it more clean, and
isolate the sharding propagation logic out to a separate class.
This is so that we can implement more complicated propagation features
later.
Differential Revision: D42876251