Skip to content

Conversation

@fegin
Copy link
Contributor

@fegin fegin commented Dec 21, 2022

Stack from ghstack (oldest at bottom):

What does this PR do?
This PR refactor _optim_utils.py to use _FSDPState instead of FullyShardedDataParallel class. This change enables the support of optim state_dict for fully_shard.

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 21, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91234

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit de7f826:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…te_dict save and load"


**What does this PR do?**
This PR refactor `_optim_utils.py` to use `_FSDPState` instead of `FullyShardedDataParallel` class. This change enables the support of optim state_dict for `fully_shard`.

[ghstack-poisoned]
…te_dict save and load"


**What does this PR do?**
This PR refactor `_optim_utils.py` to use `_FSDPState` instead of `FullyShardedDataParallel` class. This change enables the support of optim state_dict for `fully_shard`.

[ghstack-poisoned]
Copy link
Contributor

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Super exciting stuff :D

FSDPInitMode,
FSDPTest,
TransformerWithSharedParams,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we have formatting changes in separate PR?

I recognize this is tricky and I think it's time to align on formatting convention for FSDP codebase and automate it. cc @awgu

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan was to just get everyone on lintrunner and lintrunner f beginning of next half. I decided since we are cranking PRs with urgency right now, we can just not worry about it. The PR to achieve this look like: #90873 I have re-pushed recently, but the main change is just in the .lintrunner.toml file and making sure all relevant files are compliant.

I do think that unifying under lintrunner / lintrunner f is nice. Sometimes I add changes to a file that create long lines or add imports, and I want to just auto-format. However, without an agreed-upon auto-formatter, this becomes a problem and actually complicates the workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rebase this PR on top of #91255.

return 2

@skip_if_lt_x_gpu(2)
def _test_optim_state_dict_save_load(self):
Copy link
Contributor

@rohan-varma rohan-varma Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to just have test instead of disabling with prefix, and adding skip decorator mentioning reason it is disabled and filing issue

):
_insert_module_state(submodule, state)
# Insert all comm_modules to the module to state mapping.
for submodule in state._fully_sharded_module_to_handles.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change equivalent to the former code? If not, is there a reasoning we're changing the inserted states?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not equivalent to the former code. The reason behind the change is to only map the modules that actually have the handles -- the local root modules.

mapping between parameters and parameter IDs. Using ``optim_input`` is being
deprecated.
If the optimizer is a ``NamedOptimizer``, the optimizer state_dict does not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if optim_input is provided but also it is a NamedOptimizer, will that create an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will fail. Will add a error handling for this.

composable_optim_state_dict["param_groups"],
):
for key, value in group1.items():
self.assertEqual(value, group2[key])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding tests for:

  • non root FSDP
  • DDP / replicate root
  • nested FSDP + non root?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an extra test for non root FSDP. Will add more tests after fixing the all_gather_object issue that prevent us from running tests on CI.

…te_dict save and load"


**What does this PR do?**
This PR refactor `_optim_utils.py` to use `_FSDPState` instead of `FullyShardedDataParallel` class. This change enables the support of optim state_dict for `fully_shard`.

[ghstack-poisoned]
…te_dict save and load"


**What does this PR do?**
This PR refactor `_optim_utils.py` to use `_FSDPState` instead of `FullyShardedDataParallel` class. This change enables the support of optim state_dict for `fully_shard`.

[ghstack-poisoned]
fegin added a commit that referenced this pull request Dec 21, 2022
…e and load

ghstack-source-id: 83139d5
Pull Request resolved: #91234
fegin added a commit that referenced this pull request Dec 21, 2022
…onding test folders"


This PR apply ufmt to format `_composable` related code. This is a request from #91234 to separate formatting changes as a new PR. 



[ghstack-poisoned]
fegin added a commit that referenced this pull request Dec 22, 2022
… and the corresponding test folders"


This PR apply ufmt to format `_composable` related code. This is a request from #91234 to separate formatting changes as a new PR. 



[ghstack-poisoned]
fegin added a commit that referenced this pull request Dec 22, 2022
…onding test folders"


This PR apply ufmt to format `_composable` related code. This is a request from #91234 to separate formatting changes as a new PR. 



[ghstack-poisoned]
…te_dict save and load"


**What does this PR do?**
This PR refactor `_optim_utils.py` to use `_FSDPState` instead of `FullyShardedDataParallel` class. This change enables the support of optim state_dict for `fully_shard`.

[ghstack-poisoned]
fegin added a commit that referenced this pull request Dec 22, 2022
…e and load

ghstack-source-id: c89755c
Pull Request resolved: #91234
…te_dict save and load"


**What does this PR do?**
This PR refactor `_optim_utils.py` to use `_FSDPState` instead of `FullyShardedDataParallel` class. This change enables the support of optim state_dict for `fully_shard`.

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Dec 23, 2022
… folders (#91255)

This PR apply ufmt to format `_composable` related code. This is a request from #91234 to separate formatting changes as a new PR.

Pull Request resolved: #91255
Approved by: https://github.com/awgu
…te_dict save and load"


**What does this PR do?**
This PR refactor `_optim_utils.py` to use `_FSDPState` instead of `FullyShardedDataParallel` class. This change enables the support of optim state_dict for `fully_shard`.

[ghstack-poisoned]
@fegin fegin added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 29, 2022
…te_dict save and load"


**What does this PR do?**
This PR refactor `_optim_utils.py` to use `_FSDPState` instead of `FullyShardedDataParallel` class. This change enables the support of optim state_dict for `fully_shard`.

[ghstack-poisoned]
@fegin
Copy link
Contributor Author

fegin commented Dec 29, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 additional jobs have failed, first few of them are: trunk ,trunk / linux-focal-rocm5.3-py3.8 / test (default, 2, 2, linux.rocm.gpu)

Details for Dev Infra team Raised by workflow job

@fegin
Copy link
Contributor Author

fegin commented Dec 30, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/fegin/55/head branch June 8, 2023 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (fsdp) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants