[FSDP] Re-support model dtype change after FSDP init #91192

awgu · 2022-12-20T20:03:09Z

Stack from ghstack:

[FSDP][RFC] Enforce rank r's current device is cuda:r #92035 [FSDP][RFC] Enforce rank r's current device is cuda:r
[FSDP][BE] Improve device_id + CPU offload test #92031 [FSDP][BE] Improve device_id + CPU offload test
[FSDP][BE] Rename prefixed_param_names -> fqns for consolidation #92028 [FSDP][BE] Rename prefixed_param_names -> fqns for consolidation
[FSDP][BE] Better error msg for incorrect device for training #92027 [FSDP][BE] Better error msg for incorrect device for training
[FSDP] Do not clean FQNs even for use_orig_params=True #91767 [FSDP] Do not clean FQNs even for use_orig_params=True
[FSDP] Test use_orig_params=True, no_sync(), mixed precision #91193 [FSDP] Test use_orig_params=True, no_sync(), mixed precision
[FSDP] Re-support model dtype change after FSDP init #91192 [FSDP] Re-support model dtype change after FSDP init
[FSDP] Clarify MixedPrecision docs #91974 [FSDP] Clarify MixedPrecision docs

To make mixed precision precise internally, #90660 changed the implementation to save _orig_param_dtype, _low_prec_param_dtype, and _reduce_dtype explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window.

Note that any mixed precision settings specified by the user take precedence over the model dtype.

[ghstack-poisoned]

pytorch-bot · 2022-12-20T20:03:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91192

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f2ea4a2:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

To make mixed precision precise internally, #90660 changed the implementation to save `_orig_param_dtype`, `_low_prec_param_dtype`, and `_reduce_dtype` explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window. Note that any mixed precision settings specified by the user take precedence over the model dtype. [ghstack-poisoned]

ghstack-source-id: c4fb143 Pull Request resolved: pytorch#91192

To make mixed precision precise internally, #90660 changed the implementation to save `_orig_param_dtype`, `_low_prec_param_dtype`, and `_reduce_dtype` explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window. Note that any mixed precision settings specified by the user take precedence over the model dtype. [ghstack-poisoned]

ghstack-source-id: 177caa7 Pull Request resolved: pytorch#91192

Closes #90838. To make mixed precision precise internally, #90660 changed the implementation to save `_orig_param_dtype`, `_low_prec_param_dtype`, and `_reduce_dtype` explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window. Note that any mixed precision settings specified by the user take precedence over the model dtype. [ghstack-poisoned]

ghstack-source-id: cd846ec Pull Request resolved: pytorch#91192

[FSDP] Re-support model dtype change after FSDP init

101a1e6

[ghstack-poisoned]

awgu requested review from H-Huang, kwen2501, mrshenli, pritamdamania87, rohan-varma, wanchaol and zhaojuanmao as code owners December 20, 2022 20:03

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Dec 20, 2022

awgu mentioned this pull request Dec 20, 2022

[FSDP] Test use_orig_params=True, no_sync(), mixed precision #91193

Closed

awgu added the topic: improvements topic category label Dec 20, 2022

awgu pushed a commit to awgu/pytorch that referenced this pull request Dec 20, 2022

[FSDP] Re-support model dtype change after FSDP init

e3914f2

ghstack-source-id: c4fb143 Pull Request resolved: pytorch#91192

awgu pushed a commit to awgu/pytorch that referenced this pull request Dec 21, 2022

[FSDP] Re-support model dtype change after FSDP init

c8ff55c

ghstack-source-id: 177caa7 Pull Request resolved: pytorch#91192

awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 21, 2022

This was referenced Jan 5, 2023

[FSDP] Do not clean FQNs even for use_orig_params=True #91767

Closed

[PoC][FSDP] Async reduce-scatter #91865

Closed

awgu mentioned this pull request Jan 10, 2023

[FSDP] Clarify MixedPrecision docs #91974

Closed

awgu pushed a commit to awgu/pytorch that referenced this pull request Jan 11, 2023

[FSDP] Re-support model dtype change after FSDP init

ff5bf41

ghstack-source-id: cd846ec Pull Request resolved: pytorch#91192

This was referenced Jan 11, 2023

[FSDP][BE] Better error msg for incorrect device for training #92027

Closed

[FSDP][BE] Rename prefixed_param_names -> fqns for consolidation #92028

Closed

This was referenced Jan 11, 2023

[FSDP][BE] Improve device_id + CPU offload test #92031

Closed

[FSDP][RFC] Enforce rank r's current device is cuda:r #92035

Closed

zhaojuanmao approved these changes Jan 11, 2023

View reviewed changes

pytorchmergebot added the Merged label Jan 12, 2023

pytorchmergebot closed this in e5503ac Jan 12, 2023

facebook-github-bot deleted the gh/awgu/285/head branch June 8, 2023 15:33

awgu mentioned this pull request Nov 29, 2023

[WIP] Added per-parameter-sharding FSDP #114733

Closed

mori360 mentioned this pull request Feb 6, 2025

[FSDP] Investigate test_fsdp_pure_fp16.py inaccuracy #90784

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FSDP] Re-support model dtype change after FSDP init #91192

[FSDP] Re-support model dtype change after FSDP init #91192

Uh oh!

awgu commented Dec 20, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 20, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[FSDP] Re-support model dtype change after FSDP init #91192

[FSDP] Re-support model dtype change after FSDP init #91192

Uh oh!

Conversation

awgu commented Dec 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91192

✅ No Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

awgu commented Dec 20, 2022 •

edited

Loading

pytorch-bot bot commented Dec 20, 2022 •

edited

Loading