[FSDP][1/N] Update `summon_full_params(with_grads)` `None` gradient #87314

awgu · 2022-10-19T20:07:08Z

Stack from ghstack:

[FSDP][2/N] Fix grad zero vs. None edge case #87308 [FSDP][2/N] Fix grad zero vs. None edge case
[FSDP][1/N] Update summon_full_params(with_grads) None gradient #87314 [FSDP][1/N] Update summon_full_params(with_grads) None gradient

This PR changes summon_full_params(with_grads=True)'s behavior to be such that if all ranks have flat_param.grad = None, then the original parameters will correctly have orig_param.grad = None. This is achieved with a preliminary all-reduce. Note that if a particular original parameter's gradient is None on all of the containing ranks, but not all ranks' flat_param.grad = None, then that particular gradient is still going to be set to zeros. This can be handled if desired in follow-up work.

[ghstack-poisoned]

pytorch-bot · 2022-10-19T20:07:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87314

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 988a6ea:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… gradient" This PR changes `summon_full_params(with_grads=True)`'s behavior to be such that if all ranks have `flat_param.grad = None`, then the original parameters will correctly have `orig_param.grad = None`. This is achieved with a preliminary all-reduce. Note that if a particular original parameter's gradient is `None` on all of the containing ranks, but not all ranks' `flat_param.grad = None`, then that particular gradient is still going to be set to zeros. This can be handled if desired in follow-up work. [ghstack-poisoned]

zhaojuanmao · 2022-10-21T06:26:29Z

torch/distributed/fsdp/flat_param.py

    @torch.no_grad()
    def unshard_grad(self):
+        """
+        Unshards the handle's ``FlatParameter`` 's gradient. If all ranks have


nit: maybe add a comment that 'unshard_grad' is not used in critical path, only used in summon_full_params(), as it calls all_reduce and may have performance impact

… gradient" This PR changes `summon_full_params(with_grads=True)`'s behavior to be such that if all ranks have `flat_param.grad = None`, then the original parameters will correctly have `orig_param.grad = None`. This is achieved with a preliminary all-reduce. Note that if a particular original parameter's gradient is `None` on all of the containing ranks, but not all ranks' `flat_param.grad = None`, then that particular gradient is still going to be set to zeros. This can be handled if desired in follow-up work. [ghstack-poisoned]

ghstack-source-id: a0651cd Pull Request resolved: pytorch#87314

…ytorch#87314) This PR changes `summon_full_params(with_grads=True)`'s behavior to be such that if all ranks have `flat_param.grad = None`, then the original parameters will correctly have `orig_param.grad = None`. This is achieved with a preliminary all-reduce. Note that if a particular original parameter's gradient is `None` on all of the containing ranks, but not all ranks' `flat_param.grad = None`, then that particular gradient is still going to be set to zeros. This can be handled if desired in follow-up work. Pull Request resolved: pytorch#87314 Approved by: https://github.com/zhaojuanmao

[FSDP][1/N] Update summon_full_params(with_grads) None gradient

9ca0a5e

[ghstack-poisoned]

awgu requested review from H-Huang, kwen2501, mingzhe09088, mrshenli, pritamdamania87, rohan-varma and zhaojuanmao as code owners October 19, 2022 20:07

awgu mentioned this pull request Oct 19, 2022

[FSDP][2/N] Fix grad zero vs. None edge case #87308

Closed

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Oct 19, 2022

zhaojuanmao approved these changes Oct 21, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 21, 2022

pytorchmergebot closed this in 4ee13a5 Oct 21, 2022

awgu pushed a commit to awgu/pytorch that referenced this pull request Oct 21, 2022

[FSDP][1/N] Update summon_full_params(with_grads) None gradient

057a91a

ghstack-source-id: a0651cd Pull Request resolved: pytorch#87314

This was referenced Oct 24, 2022

[FSDP] Rename streams #86833

Closed

[FSDP] summon_full_params() in computation stream #86836

Closed

[FSDP] Use reduce_scatter_tensor() #87240

Closed

[FSDP] Fix use_orig_params=True + AC #87413

Closed

facebook-github-bot deleted the gh/awgu/135/head branch June 8, 2023 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FSDP][1/N] Update `summon_full_params(with_grads)` `None` gradient #87314

[FSDP][1/N] Update `summon_full_params(with_grads)` `None` gradient #87314

Uh oh!

awgu commented Oct 19, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 19, 2022 •

edited

Loading

Uh oh!

zhaojuanmao Oct 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[FSDP][1/N] Update summon_full_params(with_grads) None gradient #87314

[FSDP][1/N] Update summon_full_params(with_grads) None gradient #87314

Uh oh!

Conversation

awgu commented Oct 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87314

✅ No Failures

Uh oh!

zhaojuanmao Oct 21, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[FSDP][1/N] Update `summon_full_params(with_grads)` `None` gradient #87314

[FSDP][1/N] Update `summon_full_params(with_grads)` `None` gradient #87314

awgu commented Oct 19, 2022 •

edited

Loading

pytorch-bot bot commented Oct 19, 2022 •

edited

Loading