-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[FSDP] Doc to explain running submodules #86343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86343
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 FailuresAs of commit 2f96190: The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
awgu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this note!
| .. note: | ||
| Attempting to run the forward pass of a submodule that is contained in an | ||
| FSDP unit is not supported and will result in errors. This is because the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think maybe we should converge to just saying FSDP instance (where instance is the technical OOP term).
| FSDP unit is not supported and will result in errors. This is because the | |
| FSDP instance is not supported and will result in errors. This is because the |
| Attempting to run the forward pass of a submodule that is contained in an | ||
| FSDP unit is not supported and will result in errors. This is because the | ||
| submodule's parameters will be sharded, but it itself is not an FSDP instance, | ||
| so its forward pass will not materialize the full parameters appropriately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just a suggestion
| so its forward pass will not materialize the full parameters appropriately. | |
| so its forward pass will not all-gather the full parameters appropriately. |
| submodule's parameters will be sharded, but it itself is not an FSDP instance, | ||
| so its forward pass will not materialize the full parameters appropriately. | ||
| This could potentially happen when attempting to run only the encoder of a | ||
| encoder-decoder model, and the encoder is not wrapped in its own FSDP unit. To |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| encoder-decoder model, and the encoder is not wrapped in its own FSDP unit. To | |
| encoder-decoder model, and the encoder is not wrapped in its own FSDP instance. To |
| so its forward pass will not materialize the full parameters appropriately. | ||
| This could potentially happen when attempting to run only the encoder of a | ||
| encoder-decoder model, and the encoder is not wrapped in its own FSDP unit. To | ||
| resolve this, please wrap the submodule in its own FSDP unit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| resolve this, please wrap the submodule in its own FSDP unit. | |
| resolve this, please wrap the submodule in its own FSDP instance. |
[ghstack-poisoned]
[ghstack-poisoned]
|
failure is unrelated and probably flaky - |
|
@pytorchbot merge -f "Test failure unrelated" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
Hey @rohan-varma. |
Summary: Pull Request resolved: #86343 Approved by: https://github.com/awgu Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/f0977c4658c6f8c10e3342cf9a0249d5d23a3505 Reviewed By: seemethere Differential Revision: D40167195 Pulled By: rohan-varma fbshipit-source-id: 04b71cfa79da0b50a12815ecdda99a139bf4723b
Stack from ghstack (oldest at bottom):