-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[FSDP] Do not clean FQNs even for use_orig_params=True
#91767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91767
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d50af79: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
[ghstack-poisoned]
Cleaning FQN for `FullyShardedDataParallel(use_orig_params=True)` can cause some discrepancies with respect to the FQN compared to manually looping over `named_modules()` and `named_parameters()` together. Testing in CI... [ghstack-poisoned]
Cleaning FQN for `FullyShardedDataParallel(use_orig_params=True)` can cause some discrepancies with respect to the FQN compared to manually looping over `named_modules()` and `named_parameters()` together. Testing in CI... [ghstack-poisoned]
use_orig_params=Trueuse_orig_params=True
Cleaning FQN for `FullyShardedDataParallel(use_orig_params=True)` can cause some discrepancies with respect to the FQN compared to manually looping over `named_modules()` and `named_parameters()` together. Testing in CI... [ghstack-poisoned]
|
@awgu your PR has been successfully reverted. |
)" This reverts commit a383789. Reverted #91767 on behalf of https://github.com/huydhn due to This breaks inductor_distributed workflow https://hud.pytorch.org/pytorch/pytorch/commit/a383789f4d8ecb36adaff6bd3746430209ff0546
Cleaning FQN for `FullyShardedDataParallel(use_orig_params=True)` can cause some discrepancies with respect to the FQN compared to manually looping over `named_modules()` and `named_parameters()` together. There is no requirement for the FQNs to be clean when using wrapper FSDP + `use_orig_params=True`. We can leave clean FQNs to `fully_shard`. [ghstack-poisoned]
| assert name.startswith(prefix) | ||
| name = name[len(prefix) :] | ||
| return torch.distributed.fsdp._common_utils.clean_tensor_name(name) | ||
| return name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context: This PR changes FSDP so that when use_orig_params=True, the parameter names returned from named_parameters() include the _fsdp_wrapped_module. prefixes. Before, FSDP overrode named_parameters() to clean the prefix (see the change in fully_sharded_data_parallel.py). I found that overriding it to clean the prefix creates unwanted discrepancy with the true module structure, which can lead to some issues. We can leave having clean parameter names to the currently-developed composable APIs (e.g. fully_shard).
This change here in _dynamo/testing.py to not call clean_tensor_name() allows the unit tests to pass. I still wanted to check: does Dynamo + FSDP rely on FSDP's named_parameters() returning the cleaned names (i.e. without _fsdp_wrapped_module.)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think dynamo relies on the structure of the name either way. other than we do rely on finding the "is_fsdp_wrapped_module" boolean for enablement of fsdp handling.
question: what is the history of remove_optimized_module_prefix fn? at a glance, i have no memory of modifying this myself for fsdp but i see it had an fsdp clean param thing in it. did you put that there before and are taking it out now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed_optimized_module_prefix came from #89113
My understanding is as follows:
pytorch/torch/_dynamo/testing.py
Lines 63 to 65 in 46a81c8
| for name, param in model.named_parameters(): | |
| if isinstance(model, eval_frame.OptimizedModule): | |
| name = remove_optimized_module_prefix(name) |
pytorch/test/distributed/test_dynamo_distributed.py
Lines 368 to 370 in 46a81c8
| correct_results = collect_results(eager_model, correct_outputs.logits, correct_loss, inputs_flat) | |
| opt_results = collect_results(opt_model, opt_outputs.logits, opt_loss, inputs_flat) | |
| self.assertTrue(same(correct_results, opt_results)) |
The eager_model removed the prefixes but the opt_model did not, so for parity, we needed to manually remove the prefixes for opt_model. However, with this PR, eager_model no longer removes the prefixes, so similarly, we can remove the manual removal for opt_model.
In this case, we should be safe to land this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems ok to me
Cleaning FQN for `FullyShardedDataParallel(use_orig_params=True)` can cause some discrepancies with respect to the FQN compared to manually looping over `named_modules()` and `named_parameters()` together. There is no requirement for the FQNs to be clean when using wrapper FSDP + `use_orig_params=True`. We can leave clean FQNs to `fully_shard`. cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m "Looks like it broke |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@awgu your PR has been successfully reverted. |
)" This reverts commit d6f3265. Reverted #91767 on behalf of https://github.com/malfet due to Looks like it broke `test_compatible_with_named_optimizer` distribued tests, see https://hud.pytorch.org/pytorch/pytorch/commit/d6f3265e1add26abedb504910be93b393b9fb33c
|
We had a land race. We will re-land this PR after a fix on the optimizer state side. |
…orig_params=True`" The last PR (#91767) had a land race and got reverted. This is a re-land. cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire [ghstack-poisoned]
The last PR (#91767) had a land race and got reverted. This is a re-land. cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire [ghstack-poisoned]
…orig_params=True`" The last PR (#91767) had a land race relating to `_NamedOptimizer` + FSDP and got reverted. This is a re-land. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
The last PR (#91767) had a land race relating to `_NamedOptimizer` + FSDP and got reverted. This is a re-land. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
…orig_params=True`" The last PR (#91767) had a land race relating to `_NamedOptimizer` + FSDP and got reverted. This is a re-land. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
The last PR (#91767) had a land race relating to `_NamedOptimizer` + FSDP and got reverted. This is a re-land. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
The last PR (#91767) had a land race relating to `_NamedOptimizer` + FSDP and got reverted. This is a re-land. Pull Request resolved: #92662 Approved by: https://github.com/rohan-varma
Stack from ghstack:
r's current device iscuda:r#92035 [FSDP][RFC] Enforce rankr's current device iscuda:ruse_orig_params=True#91767 [FSDP] Do not clean FQNs even foruse_orig_params=Truedevice_id+ CPU offload test #92031 [FSDP][BE] Improvedevice_id+ CPU offload testprefixed_param_names->fqnsfor consolidation #92028 [FSDP][BE] Renameprefixed_param_names->fqnsfor consolidationCleaning FQN for
FullyShardedDataParallel(use_orig_params=True)can cause some discrepancies with respect to the FQN compared to manually looping overnamed_modules()andnamed_parameters()together.There is no requirement for the FQNs to be clean when using wrapper FSDP +
use_orig_params=True. We can leave clean FQNs tofully_shard.cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire