-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[PT-D][FSDP] Implement _clip_grad_norm_ for FSDP #73405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implement the `_clip_grad_norm_` for FSDP, issue: #72548 Differential Revision: [D34230605](https://our.internmc.facebook.com/intern/diff/D34230605/) [ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit a87e2f9 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
CI Flow Status⚛️ CI FlowRuleset - Version:
|
Implement the `_clip_grad_norm_` for FSDP, issue: #72548 Differential Revision: [D34230605](https://our.internmc.facebook.com/intern/diff/D34230605/) ghstack-source-id: 149925805 Pull Request resolved: #73405
rohan-varma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome work, thanks so much for persisting through all the usability issues and giving feedback on them offline! Added some comments for your consideration.
zhaojuanmao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work! thanks for adding this.
for tests, would you please add a norm_type config to _train_for_several_steps() in common_fsdp.py and compare ddp vs fsdp parity, similar to fairscale tests in _train_for_several_steps of test_fsdp.py for test_clip_norm_transformer and test_mixture_of_experts_grad_clip_breaks
Implement the `_clip_grad_norm_` for FSDP, issue: #72548 Differential Revision: [D34230605](https://our.internmc.facebook.com/intern/diff/D34230605/) [ghstack-poisoned]
Implement the `_clip_grad_norm_` for FSDP, issue: #72548 Differential Revision: [D34230605](https://our.internmc.facebook.com/intern/diff/D34230605/) [ghstack-poisoned]
Pull Request resolved: #73405 Implement the `_clip_grad_norm_` for FSDP, issue: #72548 ghstack-source-id: 150655449 Differential Revision: [D34230605](https://our.internmc.facebook.com/intern/diff/D34230605/)
|
Addressed the comment from reviewers. |
Implement the `_clip_grad_norm_` for FSDP, issue: #72548 Differential Revision: [D34230605](https://our.internmc.facebook.com/intern/diff/D34230605/) [ghstack-poisoned]
Pull Request resolved: #73405 Implement the `_clip_grad_norm_` for FSDP, issue: #72548 ghstack-source-id: 150951935 Differential Revision: [D34230605](https://our.internmc.facebook.com/intern/diff/D34230605/)
|
Adding the test coverage for the nested model. |
zhaojuanmao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks great to me, just one minor comment, will let @rohan-varma to accept it.
Implement the `_clip_grad_norm_` for FSDP, issue: #72548 Differential Revision: [D34230605](https://our.internmc.facebook.com/intern/diff/D34230605/) [ghstack-poisoned]
Pull Request resolved: #73405 Implement the `_clip_grad_norm_` for FSDP, issue: #72548 ghstack-source-id: 150990293 Differential Revision: [D34230605](https://our.internmc.facebook.com/intern/diff/D34230605/)
|
Further address reviewer's comment. |
rohan-varma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work, thanks for persisting through all of the issues and addressing all the comments! Looks great to ship.
Implement the `_clip_grad_norm_` for FSDP, issue: #72548 Differential Revision: [D34230605](https://our.internmc.facebook.com/intern/diff/D34230605/) [ghstack-poisoned]
Pull Request resolved: #73405 Implement the `_clip_grad_norm_` for FSDP, issue: #72548 ghstack-source-id: 151059433 Differential Revision: [D34230605](https://our.internmc.facebook.com/intern/diff/D34230605/)
|
Hey @fduwjj. |
Stack from ghstack (oldest at bottom):
Implement the
_clip_grad_norm_for FSDP, issue: #72548Differential Revision: D34230605