[FSDP()][9/N] Refactor ctor (continued) #87923

awgu · 2022-10-27T22:20:34Z

Stack from ghstack:

[FSDP] Rename unflat_param_name -> fqn for consistency #88123 [FSDP] Rename unflat_param_name -> fqn for consistency
[FSDP] Simplify _get_buffer_names() #88122 [FSDP] Simplify _get_buffer_names()
[FSDP] Remove unneeded torch.no_grad() context when offloading to CPU #88121 [FSDP] Remove unneeded torch.no_grad() context when offloading to CPU
[FSDP][Docs] Add note mentioning rate limiter for backward prefetch #88120 [FSDP][Docs] Add note mentioning rate limiter for backward prefetch
[FSDP()][27/N] Add forward hook registration #88040 [FSDP()][27/N] Add forward hook registration
[FSDP()][26/N] Move _lazy_init() into _fsdp_root_pre_forward() #87941 [FSDP()][26/N] Move _lazy_init() into _fsdp_root_pre_forward()
[FSDP()][25/N] Add _post_forward_reshard() #87940 [FSDP()][25/N] Add _post_forward_reshard()
[FSDP()][24/N] Refactor _lazy_init() #87939 [FSDP()][24/N] Refactor _lazy_init()
[FSDP()][23/N] Refactor handle attr initialization #87938 [FSDP()][23/N] Refactor handle attr initialization
[FSDP] Simplify _reset_lazy_init() #87937 [FSDP] Simplify _reset_lazy_init()
[FSDP()][22/N] Refactor _cast_buffers() in _lazy_init() #87936 [FSDP()][22/N] Refactor _cast_buffers() in _lazy_init()
[FSDP()][21/N] Refactor and fix _cast_buffers() #87935 [FSDP()][21/N] Refactor _buffer_name_to_orig_dtype computation
[FSDP] Rename dtype to buffer_name_to_dtype #87934 [FSDP] Rename dtype to buffer_name_to_dtype
[FSDP] Remove device arg from _cast_buffers() #87933 [FSDP] Remove device arg from _cast_buffers()
[FSDP()][20/N][Easy] Move functions in file #87932 [FSDP()][20/N][Easy] Move functions in file
[FSDP()][18/N] Refactor pre_forward_unshard() #87931 [FSDP()][18/N] Refactor pre_forward_unshard()
[FSDP()][17/N] Refactor _fsdp_root_pre_forward() #87930 [FSDP()][17/N] Refactor _fsdp_root_pre_forward()
[FSDP()][16/N] Refactor post-forward/pre-backward #87929 [FSDP()][16/N] Refactor post-forward/pre-backward
[FSDP()][15/N] Refactor _init_streams() #87928 [FSDP()][15/N] Refactor _init_streams()
[FSDP()][14/N] Refactor pre-forward/post-backward #87927 [FSDP()][14/N] Refactor pre-forward/post-backward
[FSDP()][13/N] Refactor unshard/reshard/grads #87926 [FSDP()][13/N] Refactor unshard/reshard/grads
[FSDP()][12/N] Easy cleanup #87925 [FSDP()][12/N] Easy cleanup
[FSDP()][10/N][11/N] Introduce composable (ctor only) #87924 [FSDP()][10/N][11/N] Introduce composable (ctor only)
[FSDP()][9/N] Refactor ctor (continued) #87923 [FSDP()][9/N] Refactor ctor (continued)

This PR makes a second pass over the constructor. The logic has been grouped into _init_<...> functions based on intent (e.g. _init_prefetching_state() or _init_runtime_state()). This makes the initialization code for composable FSDP much cleaner than having to re-write the same sequences of lower-level helper calls.

This PR also moves _ExecOrderData into its own file _exec_order_utils.py.

[ghstack-poisoned]

pytorch-bot · 2022-10-27T22:20:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87923

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures, 3 Pending

As of commit 5a5cd73:

The following jobs have failed:

cuda11.6-py3.10-gcc7-sm86 / test (default, 3, 4, linux.g5.4xlarge.nvidia.gpu)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

This PR makes a second pass over the constructor. The logic has been grouped into `_init_<...>` functions based on intent (e.g. `_init_prefetching_state()` or `_init_runtime_state()`). This makes the initialization code for composable FSDP much cleaner than having to re-write the same sequences of lower-level helper calls. This PR also moves `_ExecOrderData` into its own file `_exec_order_utils.py`. [ghstack-poisoned]

ghstack-source-id: fa1f6c7 Pull Request resolved: pytorch#87923

This PR makes a second pass over the constructor. The logic has been grouped into `_init_<...>` functions based on intent (e.g. `_init_prefetching_state()` or `_init_runtime_state()`). This makes the initialization code for composable FSDP much cleaner than having to re-write the same sequences of lower-level helper calls. This PR also moves `_ExecOrderData` into its own file `_exec_order_utils.py`. [ghstack-poisoned]

ghstack-source-id: 1b0ac6e Pull Request resolved: pytorch#87923

zhaojuanmao · 2022-11-01T08:57:32Z

torch/distributed/fsdp/_init_utils.py

 FSDP_SYNCED = "_fsdp_synced"

+# TODO (awgu): Refactor this later
+SHARDING_STRATEGY_MAP = {


curious why it needs this map if the key and value are the same?

Key is public facing ShardingStrategy. Value is private/internal HandleShardingStrategy. I think we should be able to refactor later to only use the public facing ShardingStrategy. I had two separate ones to avoid the circular import issues a while ago, but now that ShardingStrategy is in its own file api.py, hopefully this should be resolved.

zhaojuanmao · 2022-11-01T09:07:12Z

ideally we still want to run the basic FSDP benchmarks to sanity check no crash, no regression before landing the PRs as it has huge changes to the core codes

This PR makes a second pass over the constructor. The logic has been grouped into `_init_<...>` functions based on intent (e.g. `_init_prefetching_state()` or `_init_runtime_state()`). This makes the initialization code for composable FSDP much cleaner than having to re-write the same sequences of lower-level helper calls. This PR also moves `_ExecOrderData` into its own file `_exec_order_utils.py`. Pull Request resolved: pytorch#87923 Approved by: https://github.com/mrshenli

[FSDP()][9/N] Refactor ctor (continued)

0b15799

[ghstack-poisoned]

awgu requested review from H-Huang, kwen2501, mrshenli, pritamdamania87, rohan-varma and zhaojuanmao as code owners October 27, 2022 22:20

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Oct 27, 2022

Update on "[FSDP()][9/N] Refactor ctor (continued)"

7423b2d

[ghstack-poisoned]

Update on "[FSDP()][9/N] Refactor ctor (continued)"

f580445

[ghstack-poisoned]

awgu added the topic: not user facing topic category label Oct 31, 2022

awgu pushed a commit to awgu/pytorch that referenced this pull request Oct 31, 2022

[FSDP()][9/N] Refactor ctor (continued)

78c9940

ghstack-source-id: fa1f6c7 Pull Request resolved: pytorch#87923

awgu pushed a commit to awgu/pytorch that referenced this pull request Oct 31, 2022

[FSDP()][9/N] Refactor ctor (continued)

f96f194

ghstack-source-id: 1b0ac6e Pull Request resolved: pytorch#87923

zhaojuanmao reviewed Nov 1, 2022

View reviewed changes

pytorchmergebot closed this in 7817070 Nov 1, 2022

facebook-github-bot deleted the gh/awgu/153/head branch June 8, 2023 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FSDP()][9/N] Refactor ctor (continued) #87923

[FSDP()][9/N] Refactor ctor (continued) #87923

Uh oh!

awgu commented Oct 27, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 27, 2022 •

edited

Loading

Uh oh!

zhaojuanmao Nov 1, 2022

Uh oh!

awgu Nov 1, 2022

Uh oh!

zhaojuanmao commented Nov 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[FSDP()][9/N] Refactor ctor (continued) #87923

[FSDP()][9/N] Refactor ctor (continued) #87923

Uh oh!

Conversation

awgu commented Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87923

❌ 1 Failures, 3 Pending

Uh oh!

zhaojuanmao Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

awgu Nov 1, 2022

Choose a reason for hiding this comment

Uh oh!

zhaojuanmao commented Nov 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

awgu commented Oct 27, 2022 •

edited

Loading

pytorch-bot bot commented Oct 27, 2022 •

edited

Loading