[FSDP] Fix `clip_grad_norm_()` for low prec grads #90028

awgu · 2022-12-01T23:37:53Z

Stack from ghstack (oldest at bottom):

For PyTorch FSDP, the only way that gradients are in low precision is if keep_low_precision_grads=True or if the user turns on AMP. This PR adds tests for the former and improves the documentation for clip_grad_norm_(), especially around these non-full-precision cases.

[ghstack-poisoned]

pytorch-bot · 2022-12-01T23:37:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90028

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d9fd5d8:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: e936299 Pull Request resolved: #90028

For PyTorch FSDP, the only way that gradients are in low precision is if `keep_low_precision_grads=True` or if the user turns on AMP. This PR adds tests for the former and improves the documentation for `clip_grad_norm_()`, especially around these non-full-precision cases. [ghstack-poisoned]

rohan-varma

Great catch! Do we know why this was not picked up by unittests earlier?

rohan-varma · 2022-12-02T01:26:12Z

torch/distributed/fsdp/fully_sharded_data_parallel.py

-            applied per subset of model parameters.
+        .. note:: If every FSDP instance uses ``NO_SHARD``, meaning that no
+            gradients are sharded across ranks, then you may directly use
+            :func:`torch.nn.utils.clip_grad_norm_`.


Can we warn explicitly about this?

I have it so that if all instances use NO_SHARD, then this method returns torch.nn.utils.clip_grad_norm_(), so it is equivalent.

torch/distributed/fsdp/fully_sharded_data_parallel.py

For PyTorch FSDP, the only way that gradients are in low precision is if `keep_low_precision_grads=True` or if the user turns on AMP. This PR adds tests for the former and improves the documentation for `clip_grad_norm_()`, especially around these non-full-precision cases. [ghstack-poisoned]

awgu · 2022-12-02T04:05:57Z

Great catch! Do we know why this was not picked up by unittests earlier?

We did not not test keep_low_precision_grads=True previously :/

For PyTorch FSDP, the only way that gradients are in low precision is if `keep_low_precision_grads=True` or if the user turns on AMP. This PR adds tests for the former and improves the documentation for `clip_grad_norm_()`, especially around these non-full-precision cases. [ghstack-poisoned]

ghstack-source-id: 5ae7c09 Pull Request resolved: pytorch#90028

For PyTorch FSDP, the only way that gradients are in low precision is if `keep_low_precision_grads=True` or if the user turns on AMP. This PR adds tests for the former and improves the documentation for `clip_grad_norm_()`, especially around these non-full-precision cases. [ghstack-poisoned]

ghstack-source-id: e8c7026 Pull Request resolved: #90028

awgu · 2022-12-02T17:10:26Z

@pytorchbot rebase -s

pytorchmergebot · 2022-12-02T17:13:23Z

@pytorchbot successfully started a rebase job. Check the current status here

For PyTorch FSDP, the only way that gradients are in low precision is if `keep_low_precision_grads=True` or if the user turns on AMP. This PR adds tests for the former and improves the documentation for `clip_grad_norm_()`, especially around these non-full-precision cases. [ghstack-poisoned]

pytorchmergebot · 2022-12-02T17:13:44Z

Successfully rebased gh/awgu/225/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/90028)

ghstack-source-id: db8e898 Pull Request resolved: #90028

awgu · 2022-12-02T21:04:32Z

@pytorchbot merge

pytorchmergebot · 2022-12-02T21:10:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

For PyTorch FSDP, the only way that gradients are in low precision is if `keep_low_precision_grads=True` or if the user turns on AMP. This PR adds tests for the former and improves the documentation for `clip_grad_norm_()`, especially around these non-full-precision cases. Pull Request resolved: pytorch#90028 Approved by: https://github.com/rohan-varma

[FSDP] Fix clip_grad_norm_() for low prec grads

059a722

[ghstack-poisoned]

awgu requested review from H-Huang, kwen2501, mrshenli, pritamdamania87, rohan-varma and zhaojuanmao as code owners December 1, 2022 23:37

awgu mentioned this pull request Dec 1, 2022

[FSDP] Fix keep_low_precision_grads=True for use_orig_params=True #90027

Closed

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Dec 1, 2022

Update on "[FSDP] Fix clip_grad_norm_() for low prec grads"

4cf1875

[ghstack-poisoned]

awgu mentioned this pull request Dec 1, 2022

[FSDP] Define flat_grad_to() helper #90030

Closed

awgu pushed a commit that referenced this pull request Dec 1, 2022

[FSDP] Fix clip_grad_norm_() for low prec grads

df8e217

ghstack-source-id: e936299 Pull Request resolved: #90028

awgu added the topic: improvements topic category label Dec 1, 2022

rohan-varma approved these changes Dec 2, 2022

View reviewed changes

awgu pushed a commit to awgu/pytorch that referenced this pull request Dec 2, 2022

[FSDP] Fix clip_grad_norm_() for low prec grads

6135c9a

ghstack-source-id: 5ae7c09 Pull Request resolved: pytorch#90028

awgu pushed a commit that referenced this pull request Dec 2, 2022

[FSDP] Fix clip_grad_norm_() for low prec grads

cdb7a4c

ghstack-source-id: e8c7026 Pull Request resolved: #90028

awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 2, 2022

pytorchmergebot pushed a commit that referenced this pull request Dec 2, 2022

[FSDP] Fix clip_grad_norm_() for low prec grads

689c612

ghstack-source-id: db8e898 Pull Request resolved: #90028

pytorchmergebot added the Merged label Dec 2, 2022

pytorchmergebot closed this in eb56b08 Dec 2, 2022

facebook-github-bot deleted the gh/awgu/225/head branch June 8, 2023 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FSDP] Fix `clip_grad_norm_()` for low prec grads #90028

[FSDP] Fix `clip_grad_norm_()` for low prec grads #90028

Uh oh!

awgu commented Dec 1, 2022 •

edited by pytorchmergebot

Loading

Uh oh!

pytorch-bot bot commented Dec 1, 2022 •

edited

Loading

Uh oh!

rohan-varma left a comment

Uh oh!

rohan-varma Dec 2, 2022

Uh oh!

awgu Dec 2, 2022

Uh oh!

Uh oh!

awgu commented Dec 2, 2022

Uh oh!

awgu commented Dec 2, 2022

Uh oh!

pytorchmergebot commented Dec 2, 2022

Uh oh!

pytorchmergebot commented Dec 2, 2022

Uh oh!

awgu commented Dec 2, 2022

Uh oh!

pytorchmergebot commented Dec 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[FSDP] Fix clip_grad_norm_() for low prec grads #90028

[FSDP] Fix clip_grad_norm_() for low prec grads #90028

Uh oh!

Conversation

awgu commented Dec 1, 2022 • edited by pytorchmergebot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90028

✅ No Failures

Uh oh!

rohan-varma left a comment

Choose a reason for hiding this comment

Uh oh!

rohan-varma Dec 2, 2022

Choose a reason for hiding this comment

Uh oh!

awgu Dec 2, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

awgu commented Dec 2, 2022

Uh oh!

awgu commented Dec 2, 2022

Uh oh!

pytorchmergebot commented Dec 2, 2022

Uh oh!

pytorchmergebot commented Dec 2, 2022

Uh oh!

awgu commented Dec 2, 2022

Uh oh!

pytorchmergebot commented Dec 2, 2022

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[FSDP] Fix `clip_grad_norm_()` for low prec grads #90028

[FSDP] Fix `clip_grad_norm_()` for low prec grads #90028

awgu commented Dec 1, 2022 •

edited by pytorchmergebot

Loading

pytorch-bot bot commented Dec 1, 2022 •

edited

Loading