Skip to content

Conversation

@fduwjj
Copy link
Contributor

@fduwjj fduwjj commented Oct 21, 2025

Stack from ghstack (oldest at bottom):

Since we are already share a flattened tensor _rank_map across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

cc @H-Huang @awgu @wanchaol @fegin @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci

Differential Revision: D85394822

…hat we don't need to compare root mesh

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Oct 21, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 21, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166003

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 666b105 with merge base 516e589 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

fduwjj added a commit that referenced this pull request Oct 21, 2025
…hat we don't need to compare root mesh

ghstack-source-id: b20c683
Pull Request resolved: #166003
@fduwjj fduwjj requested review from fegin, lw and wconstab October 21, 2025 16:52
@fduwjj fduwjj added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 21, 2025
@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 23, 2025

@pytorchbot merge

@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 23, 2025

@pytorchbot merge -f "bot not working and all CI green"

@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 23, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Oct 23, 2025
…SPMD use case (#163358)

Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users.

One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation.

Pull Request resolved: #163358
Approved by: https://github.com/fegin
ghstack dependencies: #166003
@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 24, 2025

@fduwjj has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@clee2000
Copy link
Contributor

@pytorchbot revert -m "failing internal tests D85405179 I believe there are uses of _flatten_mesh_list internally that need to be updated" -c ghfirst

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Oct 24, 2025
…ist so that we don't need to compare root mesh (#166003)"

This reverts commit 8625ffb.

Reverted #166003 on behalf of https://github.com/clee2000 due to failing internal tests D85405179 I believe there are uses of _flatten_mesh_list internally that need to be updated ([comment](#166003 (comment)))
@pytorchmergebot
Copy link
Collaborator

@fduwjj your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added the ci-no-td Do not run TD on this PR label Oct 24, 2025
pytorchmergebot added a commit that referenced this pull request Oct 24, 2025
…esh and SPMD use case (#163358)"

This reverts commit 5a4997d.

Reverted #163358 on behalf of https://github.com/clee2000 due to probably need to revert this one  too, its stacked with #166003 (comment) ([comment](#163358 (comment)))
@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 24, 2025

@fduwjj has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…h_list so that we don't need to compare root mesh"


Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci

Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822)

[ghstack-poisoned]
fduwjj added a commit that referenced this pull request Oct 25, 2025
…hat we don't need to compare root mesh (#166003)

Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

Pull Request resolved: #166003
Approved by: https://github.com/Skylion007, https://github.com/fegin


Internal:
<< DO NOT EDIT BELOW THIS LINE >>

**GitHub Author**: fduwjj <fduwjj@gmail.com> (Meta Employee)
**GitHub Repo**: [pytorch/pytorch](https://github.com/pytorch/pytorch)
**GitHub Pull Request**: [#166003](#166003)

Initially generated by: https://www.internalfb.com/intern/sandcastle/job/9007201528851998/

This was imported as part of a Diff Train.
Please review this as soon as possible. Since it is a direct copy of a commit on
GitHub, there shouldn't be much to do.

Below line forces Sandcastle to run only specified contbuilds.
@build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_benchmark_fb,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo,pytorch_inductor,pytorch_inductor_fb,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smart__pytorch__github_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,apf,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor,aiplatform_modelstore]
#skipfbcodelongtail
#disable_code_coverage
@pytorch-oss-diff-train

diff-train-source-id: 8625ffb

Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822/)
ghstack-source-id: 318594805
…h_list so that we don't need to compare root mesh"


Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci

Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822)

[ghstack-poisoned]
fduwjj added a commit that referenced this pull request Oct 25, 2025
…hat we don't need to compare root mesh (#166003)

Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

Pull Request resolved: #166003
Approved by: https://github.com/Skylion007, https://github.com/fegin


Internal:
<< DO NOT EDIT BELOW THIS LINE >>

**GitHub Author**: fduwjj <fduwjj@gmail.com> (Meta Employee)
**GitHub Repo**: [pytorch/pytorch](https://github.com/pytorch/pytorch)
**GitHub Pull Request**: [#166003](#166003)

Initially generated by: https://www.internalfb.com/intern/sandcastle/job/9007201528851998/

This was imported as part of a Diff Train.
Please review this as soon as possible. Since it is a direct copy of a commit on
GitHub, there shouldn't be much to do.

Below line forces Sandcastle to run only specified contbuilds.
@build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_benchmark_fb,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo,pytorch_inductor,pytorch_inductor_fb,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smart__pytorch__github_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,apf,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor,aiplatform_modelstore]
#skipfbcodelongtail
#disable_code_coverage
@pytorch-oss-diff-train

diff-train-source-id: 8625ffb
ghstack-source-id: 318681631

Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822/)
…h_list so that we don't need to compare root mesh"


Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci

Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822)

[ghstack-poisoned]
fduwjj added a commit that referenced this pull request Oct 25, 2025
…hat we don't need to compare root mesh (#166003)

Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

Pull Request resolved: #166003
Approved by: https://github.com/Skylion007, https://github.com/fegin

Internal:
<< DO NOT EDIT BELOW THIS LINE >>

**GitHub Author**: fduwjj <fduwjj@gmail.com> (Meta Employee)
**GitHub Repo**: [pytorch/pytorch](https://github.com/pytorch/pytorch)
**GitHub Pull Request**: [#166003](#166003)

Initially generated by: https://www.internalfb.com/intern/sandcastle/job/9007201528851998/

This was imported as part of a Diff Train.
Please review this as soon as possible. Since it is a direct copy of a commit on
GitHub, there shouldn't be much to do.

Below line forces Sandcastle to run only specified contbuilds.
@build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_benchmark_fb,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo,pytorch_inductor,pytorch_inductor_fb,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smart__pytorch__github_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,apf,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor,aiplatform_modelstore]
#skipfbcodelongtail
#disable_code_coverage
@pytorch-oss-diff-train

diff-train-source-id: 8625ffb
ghstack-source-id: 79a788c

Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822/)
@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 25, 2025

@fduwjj has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 26, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team Raised by workflow job

…h_list so that we don't need to compare root mesh"


Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci

Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822)

[ghstack-poisoned]
fduwjj added a commit that referenced this pull request Oct 26, 2025
…hat we don't need to compare root mesh (#166003)

Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

Pull Request resolved: #166003
Approved by: https://github.com/Skylion007, https://github.com/fegin


Internal:
<< DO NOT EDIT BELOW THIS LINE >>

**GitHub Author**: fduwjj <fduwjj@gmail.com> (Meta Employee)
**GitHub Repo**: [pytorch/pytorch](https://github.com/pytorch/pytorch)
**GitHub Pull Request**: [#166003](#166003)

Initially generated by: https://www.internalfb.com/intern/sandcastle/job/9007201528851998/

This was imported as part of a Diff Train.
Please review this as soon as possible. Since it is a direct copy of a commit on
GitHub, there shouldn't be much to do.

Below line forces Sandcastle to run only specified contbuilds.
@build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_benchmark_fb,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo,pytorch_inductor,pytorch_inductor_fb,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smart__pytorch__github_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,apf,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor,aiplatform_modelstore]
#skipfbcodelongtail
#disable_code_coverage
@pytorch-oss-diff-train

diff-train-source-id: 8625ffb
ghstack-source-id: 318735710

Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822/)
pytorch-bot bot pushed a commit that referenced this pull request Oct 26, 2025
…hat we don't need to compare root mesh (#166003)

Summary:


Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci


imported-using-ghimport

Test Plan: Imported from OSS

Differential Revision: D85526705

Pulled By: fduwjj
pytorchmergebot pushed a commit that referenced this pull request Oct 27, 2025
…hat we don't need to compare root mesh (#166003) (#166264)

Summary:

Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci

imported-using-ghimport

Test Plan: Imported from OSS

Differential Revision: D85526705

Pulled By: fduwjj

Pull Request resolved: #166264
Approved by: https://github.com/XilunWu
@fduwjj
Copy link
Contributor Author

fduwjj commented Oct 27, 2025

merged in #166264

@fduwjj fduwjj closed this Oct 27, 2025
yiming0416 pushed a commit that referenced this pull request Oct 27, 2025
…hat we don't need to compare root mesh (#166003) (#166264)

Summary:

Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci

imported-using-ghimport

Test Plan: Imported from OSS

Differential Revision: D85526705

Pulled By: fduwjj

Pull Request resolved: #166264
Approved by: https://github.com/XilunWu
yiming0416 pushed a commit that referenced this pull request Oct 27, 2025
…hat we don't need to compare root mesh (#166003) (#166264)

Summary:

Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci

imported-using-ghimport

Test Plan: Imported from OSS

Differential Revision: D85526705

Pulled By: fduwjj

Pull Request resolved: #166264
Approved by: https://github.com/XilunWu
yiming0416 pushed a commit that referenced this pull request Oct 27, 2025
…hat we don't need to compare root mesh (#166003) (#166264)

Summary:

Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci

imported-using-ghimport

Test Plan: Imported from OSS

Differential Revision: D85526705

Pulled By: fduwjj

Pull Request resolved: #166264
Approved by: https://github.com/XilunWu
yiming0416 pushed a commit that referenced this pull request Oct 27, 2025
…hat we don't need to compare root mesh (#166003) (#166264)

Summary:

Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci

imported-using-ghimport

Test Plan: Imported from OSS

Differential Revision: D85526705

Pulled By: fduwjj

Pull Request resolved: #166264
Approved by: https://github.com/XilunWu
tianrengao pushed a commit that referenced this pull request Oct 30, 2025
…hat we don't need to compare root mesh (#166003) (#166264)

Summary:

Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.

We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.

cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci

imported-using-ghimport

Test Plan: Imported from OSS

Differential Revision: D85526705

Pulled By: fduwjj

Pull Request resolved: #166264
Approved by: https://github.com/XilunWu
@github-actions github-actions bot deleted the gh/fduwjj/228/head branch November 28, 2025 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged meta-exported oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: DeviceMesh Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants