[FSDP][1/N] `_summon_full_params` -> `_unshard_params` #92236

awgu · 2023-01-16T05:28:13Z

Stack from ghstack:

[FSDP][3/N] Split fully_shard unit tests #92281 [FSDP][3/N] Split fully_shard unit tests
[FSDP][2/N] Refactor summon_full_params unit tests #92263 [FSDP][2/N] Refactor summon_full_params unit tests
[FSDP][1/N] _summon_full_params -> _unshard_params #92236 [FSDP][1/N] _summon_full_params -> _unshard_params
[FSDP][BE] Improve device_id + CPU offload test #92031 [FSDP][BE] Improve device_id + CPU offload test
[FSDP][BE] Rename prefixed_param_names -> fqns for consolidation #92028 [FSDP][BE] Rename prefixed_param_names -> fqns for consolidation
[FSDP][BE] Better error msg for incorrect device for training #92027 [FSDP][BE] Better error msg for incorrect device for training

Overview
This PR stack will add support for unsharding FSDP's sharded parameters for fully_shard. This PR takes the first step by doing some internal refactoring.

The existing API for wrapper FSDP is the static method summon_full_params(), which calls into the helper _summon_full_params().
This PR refactors:
- summon_full_params() core logic to _unshard_params()
- _summon_full_params() to _unshard_params_recurse(), which has a recurse: bool argument
- Previous _unshard_params() to _unshard_fsdp_state_params(), which applies to a single FSDP state

Details

This PR introduces _get_fsdp_states_with_modules() and _get_root_fsdp_states_with_modules(), which additionally return the modules along with the FSDP states. The modules are needed for handling FlatParameter registration.
- We may be able to remove this if we clean up the use_orig_params=True vs. False code paths because for True, the FlatParameter is not registered, meaning that it does not need to be de-registered.
- Since fully_shard requires use_orig_params=True, we may not need _get_fsdp_states_with_modules() and _get_root_fsdp_root_modules(); however, I prefer to make the separation of FSDP state and module explicit for now for clarity.

Follow-Ups

writeback=True and rank0_only=True raises an error. The previous explanation was:

is not supported, as model parameter shapes will be different across ranks, and writing to them can lead to inconsistencies across ranks when the context is exited.

I am not exactly sure what the different model parameter shapes refers to. However, I believe that we can support writeback=True and rank0_only=True by broadcasting the FlatParameter from rank 0 in the finally, writing back, and freeing. This should not increase the peak memory since rank 0 already holds the unsharded FlatParameter in GPU memory before writing back and nonzero ranks do not have any other unsharded FlatParameters in GPU memory.

[ghstack-poisoned]

pytorch-bot · 2023-01-16T05:28:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92236

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit d6668dc:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 4f6b984 Pull Request resolved: #92236

**Overview** This PR stack will add support for unsharding FSDP's sharded parameters for `fully_shard`. This PR takes the first step by doing some internal refactoring. - The existing API for wrapper FSDP is the static method `summon_full_params()`, which calls into the helper `_summon_full_params()`. - This PR refactors: - `summon_full_params()` core logic to `_unshard_params()` - `_summon_full_params()` to `_unshard_params_recurse()`, which has a `recurse: bool` argument - Previous `_unshard_params()` to `_unshard_fsdp_state_params()`, which applies to a single FSDP state **Details** - This PR introduces `_get_fsdp_states_with_modules()` and `_get_root_fsdp_states_with_modules()`, which additionally return the modules along with the FSDP states. The modules are needed for handling `FlatParameter` registration. - We may be able to remove this if we clean up the `use_orig_params=True` vs. `False` code paths because for `True`, the `FlatParameter` is not registered, meaning that it does not need to be de-registered. - Since `fully_shard` requires `use_orig_params=True`, we may not need `_get_fsdp_states_with_modules()` and `_get_root_fsdp_root_modules()`; however, I prefer to make the separation of FSDP state and module explicit for now for clarity. **Follow-Ups** - `writeback=True` and `rank0_only=True` raises an error. The previous explanation was: > is not supported, as model parameter shapes will be different across ranks, and writing to them can lead to inconsistencies across ranks when the context is exited. I am not exactly sure what the different model parameter shapes refers to. However, I believe that we can support `writeback=True` and `rank0_only=True` by broadcasting the `FlatParameter` from rank 0 in the `finally`, writing back, and freeing. This should not increase the peak memory since rank 0 already holds the unsharded `FlatParameter` in GPU memory before writing back and nonzero ranks do not have any other unsharded `FlatParameter`s in GPU memory. [ghstack-poisoned]

ghstack-source-id: 82e26c4 Pull Request resolved: pytorch#92236

[FSDP][1/N] _summon_full_params -> _unshard_params

f28caa6

[ghstack-poisoned]

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Jan 16, 2023

awgu pushed a commit that referenced this pull request Jan 16, 2023

[FSDP][1/N] _summon_full_params -> _unshard_params

4e21b39

ghstack-source-id: 4f6b984 Pull Request resolved: #92236

awgu mentioned this pull request Jan 16, 2023

[FSDP][2/N] Simplify apply() #92253

Closed

awgu mentioned this pull request Jan 16, 2023

[FSDP][2/N] Refactor summon_full_params unit tests #92263

Closed

awgu pushed a commit to awgu/pytorch that referenced this pull request Jan 17, 2023

[FSDP][1/N] _summon_full_params -> _unshard_params

6bc824f

ghstack-source-id: 82e26c4 Pull Request resolved: pytorch#92236

awgu mentioned this pull request Jan 17, 2023

[FSDP][3/N] Split fully_shard unit tests #92281

Closed

awgu closed this Jan 17, 2023

facebook-github-bot deleted the gh/awgu/297/head branch June 8, 2023 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FSDP][1/N] `_summon_full_params` -> `_unshard_params` #92236

[FSDP][1/N] `_summon_full_params` -> `_unshard_params` #92236

Uh oh!

awgu commented Jan 16, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 16, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[FSDP][1/N] _summon_full_params -> _unshard_params #92236

[FSDP][1/N] _summon_full_params -> _unshard_params #92236

Uh oh!

Conversation

awgu commented Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92236

⏳ No Failures, 1 Pending

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[FSDP][1/N] `_summon_full_params` -> `_unshard_params` #92236

[FSDP][1/N] `_summon_full_params` -> `_unshard_params` #92236

awgu commented Jan 16, 2023 •

edited

Loading

pytorch-bot bot commented Jan 16, 2023 •

edited

Loading