-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[DeviceMesh] Use _flatten_rank_map to replace _flatten_mesh_list so that we don't need to compare root mesh #166003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…hat we don't need to compare root mesh [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166003
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 666b105 with merge base 516e589 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot merge |
|
@pytorchbot merge -f "bot not working and all CI green" |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…SPMD use case (#163358) Today FSDP needs to slicing out spmd mesh from root mesh here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_param.py#L301. But essentially, users want is a concatenate of some submesh into a big mesh and used as a spmd mesh. This PR is tentatively trying to implement this API for users. One thing to note is that, all sub-mesh needs to slicing/flatten or unflatten from same root mesh otherwise the indices make no sense when it comes to mesh indexing and device allocation. Pull Request resolved: #163358 Approved by: https://github.com/fegin ghstack dependencies: #166003
|
@fduwjj has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@pytorchbot revert -m "failing internal tests D85405179 I believe there are uses of _flatten_mesh_list internally that need to be updated" -c ghfirst |
|
@pytorchbot successfully started a revert job. Check the current status here. |
…ist so that we don't need to compare root mesh (#166003)" This reverts commit 8625ffb. Reverted #166003 on behalf of https://github.com/clee2000 due to failing internal tests D85405179 I believe there are uses of _flatten_mesh_list internally that need to be updated ([comment](#166003 (comment)))
|
@fduwjj your PR has been successfully reverted. |
…esh and SPMD use case (#163358)" This reverts commit 5a4997d. Reverted #163358 on behalf of https://github.com/clee2000 due to probably need to revert this one too, its stacked with #166003 (comment) ([comment](#163358 (comment)))
|
@fduwjj has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…h_list so that we don't need to compare root mesh" Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822) [ghstack-poisoned]
…hat we don't need to compare root mesh (#166003) Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. Pull Request resolved: #166003 Approved by: https://github.com/Skylion007, https://github.com/fegin Internal: << DO NOT EDIT BELOW THIS LINE >> **GitHub Author**: fduwjj <fduwjj@gmail.com> (Meta Employee) **GitHub Repo**: [pytorch/pytorch](https://github.com/pytorch/pytorch) **GitHub Pull Request**: [#166003](#166003) Initially generated by: https://www.internalfb.com/intern/sandcastle/job/9007201528851998/ This was imported as part of a Diff Train. Please review this as soon as possible. Since it is a direct copy of a commit on GitHub, there shouldn't be much to do. Below line forces Sandcastle to run only specified contbuilds. @build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_benchmark_fb,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo,pytorch_inductor,pytorch_inductor_fb,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smart__pytorch__github_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,apf,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor,aiplatform_modelstore] #skipfbcodelongtail #disable_code_coverage @pytorch-oss-diff-train diff-train-source-id: 8625ffb Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822/) ghstack-source-id: 318594805
…h_list so that we don't need to compare root mesh" Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822) [ghstack-poisoned]
…hat we don't need to compare root mesh (#166003) Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. Pull Request resolved: #166003 Approved by: https://github.com/Skylion007, https://github.com/fegin Internal: << DO NOT EDIT BELOW THIS LINE >> **GitHub Author**: fduwjj <fduwjj@gmail.com> (Meta Employee) **GitHub Repo**: [pytorch/pytorch](https://github.com/pytorch/pytorch) **GitHub Pull Request**: [#166003](#166003) Initially generated by: https://www.internalfb.com/intern/sandcastle/job/9007201528851998/ This was imported as part of a Diff Train. Please review this as soon as possible. Since it is a direct copy of a commit on GitHub, there shouldn't be much to do. Below line forces Sandcastle to run only specified contbuilds. @build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_benchmark_fb,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo,pytorch_inductor,pytorch_inductor_fb,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smart__pytorch__github_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,apf,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor,aiplatform_modelstore] #skipfbcodelongtail #disable_code_coverage @pytorch-oss-diff-train diff-train-source-id: 8625ffb ghstack-source-id: 318681631 Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822/)
…h_list so that we don't need to compare root mesh" Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822) [ghstack-poisoned]
…hat we don't need to compare root mesh (#166003) Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. Pull Request resolved: #166003 Approved by: https://github.com/Skylion007, https://github.com/fegin Internal: << DO NOT EDIT BELOW THIS LINE >> **GitHub Author**: fduwjj <fduwjj@gmail.com> (Meta Employee) **GitHub Repo**: [pytorch/pytorch](https://github.com/pytorch/pytorch) **GitHub Pull Request**: [#166003](#166003) Initially generated by: https://www.internalfb.com/intern/sandcastle/job/9007201528851998/ This was imported as part of a Diff Train. Please review this as soon as possible. Since it is a direct copy of a commit on GitHub, there shouldn't be much to do. Below line forces Sandcastle to run only specified contbuilds. @build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_benchmark_fb,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo,pytorch_inductor,pytorch_inductor_fb,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smart__pytorch__github_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,apf,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor,aiplatform_modelstore] #skipfbcodelongtail #disable_code_coverage @pytorch-oss-diff-train diff-train-source-id: 8625ffb ghstack-source-id: 79a788c Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822/)
|
@fduwjj has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@pytorchbot merge |
Merge failedReason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR! Details for Dev Infra teamRaised by workflow job |
…h_list so that we don't need to compare root mesh" Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822) [ghstack-poisoned]
…hat we don't need to compare root mesh (#166003) Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. Pull Request resolved: #166003 Approved by: https://github.com/Skylion007, https://github.com/fegin Internal: << DO NOT EDIT BELOW THIS LINE >> **GitHub Author**: fduwjj <fduwjj@gmail.com> (Meta Employee) **GitHub Repo**: [pytorch/pytorch](https://github.com/pytorch/pytorch) **GitHub Pull Request**: [#166003](#166003) Initially generated by: https://www.internalfb.com/intern/sandcastle/job/9007201528851998/ This was imported as part of a Diff Train. Please review this as soon as possible. Since it is a direct copy of a commit on GitHub, there shouldn't be much to do. Below line forces Sandcastle to run only specified contbuilds. @build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_benchmark_fb,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo,pytorch_inductor,pytorch_inductor_fb,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smart__pytorch__github_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,apf,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor,aiplatform_modelstore] #skipfbcodelongtail #disable_code_coverage @pytorch-oss-diff-train diff-train-source-id: 8625ffb ghstack-source-id: 318735710 Differential Revision: [D85394822](https://our.internmc.facebook.com/intern/diff/D85394822/)
…hat we don't need to compare root mesh (#166003) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj
…hat we don't need to compare root mesh (#166003) (#166264) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj Pull Request resolved: #166264 Approved by: https://github.com/XilunWu
|
merged in #166264 |
…hat we don't need to compare root mesh (#166003) (#166264) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj Pull Request resolved: #166264 Approved by: https://github.com/XilunWu
…hat we don't need to compare root mesh (#166003) (#166264) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj Pull Request resolved: #166264 Approved by: https://github.com/XilunWu
…hat we don't need to compare root mesh (#166003) (#166264) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj Pull Request resolved: #166264 Approved by: https://github.com/XilunWu
…hat we don't need to compare root mesh (#166003) (#166264) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj Pull Request resolved: #166264 Approved by: https://github.com/XilunWu
…hat we don't need to compare root mesh (#166003) (#166264) Summary: Since we are already share a flattened tensor `_rank_map` across all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code. We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D85526705 Pulled By: fduwjj Pull Request resolved: #166264 Approved by: https://github.com/XilunWu
Stack from ghstack (oldest at bottom):
Since we are already share a flattened tensor
_rank_mapacross all meshes from a same root mesh, we can just use a flattened list of it to replace the comparison of root_mesh and flattened_mesh_list (because with same _rank_map and layout, the mesh tensor is guaranteed to be the same). This way we can also give back the CPU overhead added in #164510 and further simply the code.We do have a more ambitious universe-based change here: #165680 but it needs more discussions and would lead to BC breaking. We might eventually merge that PR but probably not now and this is a change which is not BC breaking and will help concatenate and 2D integration with concatenate.
cc @H-Huang @awgu @wanchaol @fegin @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci
Differential Revision: D85394822