-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[dtensor][5/N] add cached propagator for TP #90734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90734
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 FailuresAs of commit c74fec7: NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
XilunWu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx for enabling _CachingPropagator in TP api.
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. [ghstack-poisoned]
|
@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. Differential Revision: [D42876249](https://our.internmc.facebook.com/intern/diff/D42876249) [ghstack-poisoned]
|
@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. Differential Revision: [D42876249](https://our.internmc.facebook.com/intern/diff/D42876249) [ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding prop decision for the same input sharding on an operator. This could improve eager mode performance. Differential Revision: [D42876249](https://our.internmc.facebook.com/intern/diff/D42876249) [ghstack-poisoned]
fduwjj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this happens! Just a small QQ. Unblock for now.
|
|
||
| # switch the DTensor propagator to use the caching propagator to speed up | ||
| # the TP eager execution time. | ||
| DTensor._propagator = _CachingPropagator(DTensor._propagator.op_to_rules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a flag to switch this on/off. Or we want to enable it by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should turn this on by default, otherwise eager mode perf would be very bad, especially once we implement more complicated sharding prop. This caching logic is relatively safe as it only change the one with the exact same input specs.
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
How are these linked to the incorrect diff that was landed? From what I understand the correct diff for this is D42888015 cc @bigfootjon |
|
Ping me on workchat and I'll take a look tomorrow |
Stack from ghstack (oldest at bottom):
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.
Differential Revision: D42876249