[DeviceMesh] Implement a device mesh concatenate api for submesh and SPMD use case #163358

fduwjj · 2025-09-19T18:34:58Z

Stack from ghstack (oldest at bottom):

[DeviceMesh][2D] Use concatenate for 2D (FSDP+TP) instead of getting from root mesh #165492
-> [DeviceMesh] Implement a device mesh concatenate api for submesh and SPMD use case #163358
[DeviceMesh] Use _flatten_rank_map to replace _flatten_mesh_list so that we don't need to compare root mesh (#166003) #166264

Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users.

One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation.

cc @H-Huang @awgu @wanchaol @fegin @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim @dcci

Differential Revision: D85409698

[ghstack-poisoned]

pytorch-bot · 2025-09-19T18:35:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163358

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit cc835fc with merge base 000f495 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, linux.2xlarge, unstable) (gh) (#166072)
examples/models/llama3_2_vision/text_decoder/test/test_text_decoder.py::TextDecoderTest::test_llama3_2_text_decoder_aoti

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 814f54f Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: dd5ba8d Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 4544de6 Pull Request resolved: #163358

torch/distributed/device_mesh.py

ezyang · 2025-09-21T01:16:19Z

torch/distributed/device_mesh.py

+                get_world_size(),
+            )
+
+            for mesh_nd in pg_ranks_by_dim:


?!?! Why do you need to do it for every mesh_nd? Is this because you're triggering comms to initialize PGs?

so long story short, we need all ranks to call new_group which is hidden very deep in the stack to initialize PGs. Otherwise the code will hang.

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 5bd2d76 Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: de04fad Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 14dff21 Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: b6a332e Pull Request resolved: #163358

…or submesh" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 367e222 Pull Request resolved: #163358

…ubmesh and SPMD use case" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 9f368c8 Pull Request resolved: #163358

fduwjj · 2025-10-23T23:23:56Z

@pytorchbot merge -i

pytorchmergebot · 2025-10-23T23:25:38Z

Merge started

Your change will be merged while ignoring the following 2 checks: trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge, unstable), trunk / linux-jammy-rocm-py3.10 / test (default, 2, 2, linux.rocm.gpu.gfx942.1)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

clee2000 · 2025-10-24T15:57:10Z

@pytorchbot revert -m "probably need to revert this one too, its stacked with #166003 (comment)" -c ghfirst

pytorchmergebot · 2025-10-24T15:58:49Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

…esh and SPMD use case (#163358)" This reverts commit 5a4997d. Reverted #163358 on behalf of https://github.com/clee2000 due to probably need to revert this one too, its stacked with #166003 (comment) ([comment](#163358 (comment)))

pytorchmergebot · 2025-10-24T15:58:57Z

@fduwjj your PR has been successfully reverted.

…ubmesh and SPMD use case" Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta ezyang msaroufim dcci [ghstack-poisoned]

ghstack-source-id: 2cdb521 Pull Request resolved: #163358

fduwjj · 2025-10-27T04:06:35Z

@pytorchbot merge

pytorchmergebot · 2025-10-27T04:08:25Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

fduwjj · 2025-10-27T14:19:25Z

@fduwjj has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

fduwjj · 2025-10-27T14:26:35Z

@fduwjj has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…from root mesh (#165492) With concatenate API, we can directly combine two meshes together rather than getting the spmd mesh from root. Differential Revision: [D85409698](https://our.internmc.facebook.com/intern/diff/D85409698) Pull Request resolved: #165492 Approved by: https://github.com/fegin ghstack dependencies: #163358

…SPMD use case (#163358) Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. Pull Request resolved: #163358 Approved by: https://github.com/fegin

…from root mesh (#165492) With concatenate API, we can directly combine two meshes together rather than getting the spmd mesh from root. Differential Revision: [D85409698](https://our.internmc.facebook.com/intern/diff/D85409698) Pull Request resolved: #165492 Approved by: https://github.com/fegin ghstack dependencies: #163358

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

09db0ef

[ghstack-poisoned]

This was referenced Sep 19, 2025

[DeviceMesh] Introduce CuTe layout into devicemesh code base for internal bookkeeping #163212

Closed

[DeviceMesh] Add extra check in flatten result cache lookup #163288

Closed

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Sep 19, 2025

fduwjj mentioned this pull request Sep 19, 2025

[DeviceMesh] Simplifying internal bookkeeping with CuTe layout #163213

Closed

fduwjj mentioned this pull request Sep 19, 2025

[device_mesh] Implement _unflatten on top of CuTe layout bookkeeping #161224

Closed

fduwjj added a commit that referenced this pull request Sep 19, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

9a6554e

ghstack-source-id: 814f54f Pull Request resolved: #163358

fduwjj requested review from ezyang, fegin and tianyu-l September 19, 2025 18:42

fduwjj added the release notes: DeviceMesh label Sep 19, 2025

fduwjj mentioned this pull request Sep 19, 2025

[CuTe] Add layout overlap checking util function in _MeshLayout #163367

Closed

fduwjj added a commit that referenced this pull request Sep 19, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

0cb4d74

ghstack-source-id: dd5ba8d Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Sep 20, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

c6a4a27

ghstack-source-id: 4544de6 Pull Request resolved: #163358

ezyang reviewed Sep 21, 2025

View reviewed changes

torch/distributed/device_mesh.py Outdated Show resolved Hide resolved

ezyang reviewed Sep 21, 2025

View reviewed changes

torch/distributed/device_mesh.py Show resolved Hide resolved

ezyang reviewed Sep 21, 2025

View reviewed changes

fduwjj added a commit that referenced this pull request Sep 24, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

3fe5b5f

ghstack-source-id: 5bd2d76 Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Sep 25, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

e9a267c

ghstack-source-id: de04fad Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Sep 27, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

abb7386

ghstack-source-id: 14dff21 Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Sep 30, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

c7f630a

ghstack-source-id: b6a332e Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Oct 1, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

829e42a

ghstack-source-id: 367e222 Pull Request resolved: #163358

fduwjj added a commit that referenced this pull request Oct 23, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

c950dbc

ghstack-source-id: 9f368c8 Pull Request resolved: #163358

pytorchmergebot added the merging label Oct 23, 2025

pytorchmergebot added the Merged label Oct 23, 2025

pytorchmergebot closed this in 5a4997d Oct 23, 2025

pytorchmergebot removed the merging label Oct 23, 2025

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Oct 24, 2025

pytorchmergebot reopened this Oct 24, 2025

fduwjj added a commit that referenced this pull request Oct 27, 2025

[For Discussion][DeviceMesh] Implement a concatenate api for submesh

9ae18a2

ghstack-source-id: 2cdb521 Pull Request resolved: #163358

pytorchmergebot added the merging label Oct 27, 2025

pytorchmergebot closed this in 6530bc7 Oct 27, 2025

pytorchmergebot removed the merging label Oct 27, 2025

github-actions bot deleted the gh/fduwjj/206/head branch November 28, 2025 02:18

[DeviceMesh] Implement a device mesh concatenate api for submesh and SPMD use case #163358

[DeviceMesh] Implement a device mesh concatenate api for submesh and SPMD use case #163358

Uh oh!

Conversation

fduwjj commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163358

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Uh oh!

ezyang Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

fduwjj Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

fduwjj commented Oct 23, 2025

Uh oh!

pytorchmergebot commented Oct 23, 2025

Merge started

Uh oh!

clee2000 commented Oct 24, 2025

Uh oh!

pytorchmergebot commented Oct 24, 2025

Uh oh!

pytorchmergebot commented Oct 24, 2025

Uh oh!

fduwjj commented Oct 27, 2025

Uh oh!

pytorchmergebot commented Oct 27, 2025

Merge started

Uh oh!

fduwjj commented Oct 27, 2025

Uh oh!

fduwjj commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fduwjj commented Sep 19, 2025 •

edited

Loading

pytorch-bot bot commented Sep 19, 2025 •

edited

Loading