Skip to content

Conversation

@awgu
Copy link
Collaborator

@awgu awgu commented Dec 20, 2022

Stack from ghstack:

Closes #90838.

To make mixed precision precise internally, #90660 changed the implementation to save _orig_param_dtype, _low_prec_param_dtype, and _reduce_dtype explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window.

Note that any mixed precision settings specified by the user take precedence over the model dtype.

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 20, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91192

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f2ea4a2:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

To make mixed precision precise internally, #90660 changed the implementation to save `_orig_param_dtype`, `_low_prec_param_dtype`, and `_reduce_dtype` explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window.

Note that any mixed precision settings specified by the user take precedence over the model dtype.

[ghstack-poisoned]
awgu pushed a commit to awgu/pytorch that referenced this pull request Dec 20, 2022
To make mixed precision precise internally, #90660 changed the implementation to save `_orig_param_dtype`, `_low_prec_param_dtype`, and `_reduce_dtype` explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window.

Note that any mixed precision settings specified by the user take precedence over the model dtype.

[ghstack-poisoned]
awgu pushed a commit to awgu/pytorch that referenced this pull request Dec 21, 2022
@awgu awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 21, 2022
Closes #90838.

To make mixed precision precise internally, #90660 changed the implementation to save `_orig_param_dtype`, `_low_prec_param_dtype`, and `_reduce_dtype` explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window.

Note that any mixed precision settings specified by the user take precedence over the model dtype.

[ghstack-poisoned]
Closes #90838.

To make mixed precision precise internally, #90660 changed the implementation to save `_orig_param_dtype`, `_low_prec_param_dtype`, and `_reduce_dtype` explicitly. However, these are computed at FSDP construction time, so it does not allow the user to change the model dtype after FSDP construction time but before lazy initialization. This PR recomputes those dtype attributes as needed if the model dtype changes in that window.

Note that any mixed precision settings specified by the user take precedence over the model dtype.

[ghstack-poisoned]
awgu pushed a commit to awgu/pytorch that referenced this pull request Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (fsdp) release notes category topic: improvements topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants