[DTensor] implement dist_split as a sharding prop rule #93306

XilunWu · 2023-01-30T21:36:01Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2023-01-30T21:36:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93306

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e1cc5d5:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 17951d9 Pull Request resolved: #93306

[ghstack-poisoned]

ghstack-source-id: 95deefb Pull Request resolved: #93306

wanchaol

Almost ready! Have some nits and suggestions inlined.

torch/distributed/_tensor/ops/tensor_ops.py

wanchaol · 2023-02-01T20:58:09Z

torch/distributed/_tensor/ops/tensor_ops.py

+
+    # TODO: just like slice op, split replicates before splitting
+    # on a sharded dimension
+    # TODO: shall we consider partial???


we should consider partial (maybe we can add this later), and because the dtensor_ops test does not generate partial inputs, we also need to add the partial inputs to op db

torch/distributed/_tensor/ops/tensor_ops.py

wanchaol · 2023-02-01T23:43:47Z

torch/distributed/_tensor/ops/tensor_ops.py

+            placements=unshard_tensor_dim(input_spec.placements, dim=dim),
+            shape=input_spec.shape,
+            ndim=input_spec.ndim,
+        )


nit: add a check to partial input_spec and raise NotImplementedError so we know to implement this later?

sounds good!

[ghstack-poisoned]

ghstack-source-id: fb8f536 Pull Request resolved: #93306

XilunWu · 2023-02-02T04:57:52Z

@pytorchmergebot merge -g

pytorchmergebot · 2023-02-02T04:59:41Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

fduwjj · 2023-05-02T17:08:25Z

torch/distributed/_tensor/ops/tensor_ops.py

+    need_reshard = False
+    if is_tensor_dim_sharded(input_spec, dim=dim):


This somehow broke TP's code logic. Because a common technique people are using is that the DTensor is sharded on the last dim and they call split on the last dim too. We still want the result to be sharded on dim=-1.

Currently after split, we got replicate as a DTensor.

[DTensor] implement dist_split as a sharding prop rule

56c4cea

[ghstack-poisoned]

XilunWu requested review from H-Huang, awgu, kwen2501, mrshenli, rohan-varma, wanchaol and zhaojuanmao as code owners January 30, 2023 21:36

XilunWu added a commit that referenced this pull request Jan 30, 2023

[DTensor] implement dist_split as a sharding prop rule

98e3432

ghstack-source-id: 17951d9 Pull Request resolved: #93306

XilunWu marked this pull request as draft January 30, 2023 21:36

XilunWu changed the title ~~[DTensor] implement dist_split as a sharding prop rule~~ [WIP] [DTensor] implement dist_split as a sharding prop rule Jan 30, 2023

XilunWu added the release notes: distributed (dtensor) release notes category label Jan 30, 2023

Update on "[WIP] [DTensor] implement dist_split as a sharding prop rule"

e5b8af3

[ghstack-poisoned]

This was referenced Feb 1, 2023

[DTensor] fix DTensorSpec dim_map description #93160

Closed

[DTensor][fix] MultiThreadedTestCase misses _tls object and it won't reflect in CI #93832

Closed

XilunWu marked this pull request as ready for review February 1, 2023 08:28

XilunWu changed the title ~~[WIP] [DTensor] implement dist_split as a sharding prop rule~~ [DTensor] implement dist_split as a sharding prop rule Feb 1, 2023

Update on "[DTensor] implement dist_split as a sharding prop rule"

61c676a

[ghstack-poisoned]

XilunWu added a commit that referenced this pull request Feb 1, 2023

[DTensor] implement dist_split as a sharding prop rule

d7f1c3d

ghstack-source-id: 95deefb Pull Request resolved: #93306

wanchaol reviewed Feb 1, 2023

View reviewed changes

wanchaol approved these changes Feb 1, 2023

View reviewed changes

XilunWu added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 2, 2023

Update on "[DTensor] implement dist_split as a sharding prop rule"

e1cc5d5

[ghstack-poisoned]

XilunWu added a commit that referenced this pull request Feb 2, 2023

[DTensor] implement dist_split as a sharding prop rule

b2561e0

ghstack-source-id: fb8f536 Pull Request resolved: #93306

pytorchmergebot added the Merged label Feb 2, 2023

pytorchmergebot closed this in 6f3018d Feb 2, 2023

XilunWu deleted the gh/XilunWu/14/head branch April 11, 2023 21:40

fduwjj reviewed May 2, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DTensor] implement dist_split as a sharding prop rule #93306

[DTensor] implement dist_split as a sharding prop rule #93306

Uh oh!

XilunWu commented Jan 30, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 30, 2023 •

edited

Loading

Uh oh!

wanchaol left a comment

Uh oh!

Uh oh!

wanchaol Feb 1, 2023

Uh oh!

Uh oh!

Uh oh!

wanchaol Feb 1, 2023

Uh oh!

XilunWu Feb 1, 2023

Uh oh!

XilunWu commented Feb 2, 2023

Uh oh!

pytorchmergebot commented Feb 2, 2023

Uh oh!

fduwjj May 2, 2023

Uh oh!

fduwjj May 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		need_reshard = False
		if is_tensor_dim_sharded(input_spec, dim=dim):

[DTensor] implement dist_split as a sharding prop rule #93306

[DTensor] implement dist_split as a sharding prop rule #93306

Uh oh!

Conversation

XilunWu commented Jan 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93306

✅ No Failures

Uh oh!

wanchaol left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wanchaol Feb 1, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wanchaol Feb 1, 2023

Choose a reason for hiding this comment

Uh oh!

XilunWu Feb 1, 2023

Choose a reason for hiding this comment

Uh oh!

XilunWu commented Feb 2, 2023

Uh oh!

pytorchmergebot commented Feb 2, 2023

Merge started

Uh oh!

fduwjj May 2, 2023

Choose a reason for hiding this comment

Uh oh!

fduwjj May 2, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

XilunWu commented Jan 30, 2023 •

edited

Loading

pytorch-bot bot commented Jan 30, 2023 •

edited

Loading