-
Notifications
You must be signed in to change notification settings - Fork 26.3k
make sharding strategy configurable and support zero2 algorithm #73819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
adding a new sharding_strategy config in FSDP API to support different data parallel algorithm. also add support for zero2 algorithm, which will only shard optimizer states and grads Differential Revision: [D34662583](https://our.internmc.facebook.com/intern/diff/D34662583/) [ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 012f852 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
adding a new sharding_strategy config in FSDP API to support different data parallel algorithm. also add support for zero2 algorithm, which will only shard optimizer states and grads Differential Revision: [D34662583](https://our.internmc.facebook.com/intern/diff/D34662583/) ghstack-source-id: 150613344 Pull Request resolved: #73819
…2 algorithm" adding a new sharding_strategy config in FSDP API to support different data parallel algorithm. also add support for zero2 algorithm, which will only shard optimizer states and grads Differential Revision: [D34662583](https://our.internmc.facebook.com/intern/diff/D34662583/) [ghstack-poisoned]
Pull Request resolved: #73819 adding a new sharding_strategy config in FSDP API to support different data parallel algorithm. also add support for zero2 algorithm, which will only shard optimizer states and grads ghstack-source-id: 150893171 Differential Revision: [D34662583](https://our.internmc.facebook.com/intern/diff/D34662583/)
rohan-varma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, excited to have users try out Zero-2!
Stamping to unblock, but some tests seem to be failing.
…rithm" adding a new sharding_strategy config in FSDP API to support different data parallel algorithm. also add support for zero2 algorithm, which will only shard optimizer states and grads Differential Revision: [D34662583](https://our.internmc.facebook.com/intern/diff/D34662583/) [ghstack-poisoned]
…rithm" adding a new sharding_strategy config in FSDP API to support different data parallel algorithm. also add support for zero2 algorithm, which will only shard optimizer states and grads Differential Revision: [D34662583](https://our.internmc.facebook.com/intern/diff/D34662583/) [ghstack-poisoned]
…rithm" adding a new sharding_strategy config in FSDP API to support different data parallel algorithm. also add support for zero2 algorithm, which will only shard optimizer states and grads Differential Revision: [D34662583](https://our.internmc.facebook.com/intern/diff/D34662583/) [ghstack-poisoned]
Pull Request resolved: #73819 adding a new sharding_strategy config in FSDP API to support different data parallel algorithm. also add support for zero2 algorithm, which will only shard optimizer states and grads ghstack-source-id: 151454460 Differential Revision: [D34662583](https://our.internmc.facebook.com/intern/diff/D34662583/)
Summary: Pull Request resolved: #73819 adding a new sharding_strategy config in FSDP API to support different data parallel algorithm. also add support for zero2 algorithm, which will only shard optimizer states and grads ghstack-source-id: 151454460 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D34662583 fbshipit-source-id: 14c6e0c0054692ecd76512c025d60deb4964ec5f
Stack from ghstack (oldest at bottom):
adding a new sharding_strategy config in FSDP API to support different data parallel algorithm. also add support for zero2 algorithm, which will only shard optimizer states and grads
Differential Revision: D34662583