Skip to content

Conversation

@wanchaol
Copy link
Collaborator

@wanchaol wanchaol commented Dec 13, 2022

Stack from ghstack (oldest at bottom):

This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

Differential Revision: D42876249

This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 13, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90734

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit c74fec7:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
@wanchaol wanchaol added the release notes: distributed (dtensor) release notes category label Dec 20, 2022
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
@wanchaol wanchaol changed the title [dtensor][4/N] add cached propagator for TP [dtensor][5/N] add cached propagator for TP Jan 6, 2023
@wanchaol wanchaol requested review from XilunWu, aazzolini and fduwjj and removed request for pritamdamania87 January 6, 2023 23:07
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
Copy link
Contributor

@XilunWu XilunWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for enabling _CachingPropagator in TP api.

This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

[ghstack-poisoned]
@wanchaol
Copy link
Collaborator Author

@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

Differential Revision: [D42876249](https://our.internmc.facebook.com/intern/diff/D42876249)

[ghstack-poisoned]
@wanchaol
Copy link
Collaborator Author

@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

Differential Revision: [D42876249](https://our.internmc.facebook.com/intern/diff/D42876249)

[ghstack-poisoned]
This PR adds a cached propagator for TP use, it caches the sharding
prop decision for the same input sharding on an operator. This could
improve eager mode performance.

Differential Revision: [D42876249](https://our.internmc.facebook.com/intern/diff/D42876249)

[ghstack-poisoned]
Copy link
Contributor

@fduwjj fduwjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this happens! Just a small QQ. Unblock for now.


# switch the DTensor propagator to use the caching propagator to speed up
# the TP eager execution time.
DTensor._propagator = _CachingPropagator(DTensor._propagator.op_to_rules)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a flag to switch this on/off. Or we want to enable it by default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should turn this on by default, otherwise eager mode perf would be very bad, especially once we implement more complicated sharding prop. This caching logic is relatively safe as it only change the one with the exact same input specs.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 1, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@seemethere
Copy link
Member

seemethere commented Feb 1, 2023

@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

How are these linked to the incorrect diff that was landed? From what I understand the correct diff for this is D42888015

cc @bigfootjon

@bigfootjon
Copy link
Member

Ping me on workchat and I'll take a look tomorrow

@facebook-github-bot facebook-github-bot deleted the gh/wanchaol/238/head branch June 8, 2023 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants