Skip to content

Conversation

@awgu
Copy link
Collaborator

@awgu awgu commented Oct 21, 2022

Stack from ghstack:

This PR removes the property params_with_grad from FullyShardedDataParallel. It was introduced when implementing clip_grad_norm_() but was not consistently used. Personally, I do not think it makes sense for FullyShardedDataParallel to expose this helper because it is not a common paradigm.

This PR is technically BC-breaking. However, I checked that no one internally is using this API.

cc @ezyang @gchanan

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 21, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87480

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 3 Pending

As of commit d69725e:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: distributed (sharded) release notes category label Oct 21, 2022
awgu pushed a commit that referenced this pull request Oct 21, 2022
ghstack-source-id: 0a98d57
Pull Request resolved: #87480
@awgu awgu added release notes: distributed (fsdp) release notes category and removed release notes: distributed (sharded) release notes category labels Oct 21, 2022
@awgu awgu changed the title [FSDP] Remove params_with_grad [FSDP][2/N] Remove params_with_grad Oct 21, 2022
This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm.

This PR is technically BC-breaking. However, I checked that no one internally is using this API.


[ghstack-poisoned]
This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm.

This PR is technically BC-breaking. However, I checked that no one internally is using this API.


[ghstack-poisoned]
awgu pushed a commit that referenced this pull request Oct 22, 2022
ghstack-source-id: b75d378
Pull Request resolved: #87480
Copy link
Contributor

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good please add BC breaking label for release tracking purposes.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 24, 2022
@awgu awgu added module: bc-breaking Related to a BC-breaking change topic: bc breaking topic category labels Oct 24, 2022
This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm.

This PR is technically BC-breaking. However, I checked that no one internally is using this API.


cc @ezyang @gchanan

[ghstack-poisoned]
awgu pushed a commit that referenced this pull request Oct 24, 2022
ghstack-source-id: d3ce1d3
Pull Request resolved: #87480
@awgu
Copy link
Collaborator Author

awgu commented Oct 24, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Nov 5, 2022
This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm.

This PR is technically BC-breaking. However, I checked that no one internally is using this API.

cc @ezyang @gchanan
Pull Request resolved: pytorch#87480
Approved by: https://github.com/rohan-varma
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
This PR removes the property `params_with_grad` from `FullyShardedDataParallel`. It was introduced when implementing `clip_grad_norm_()` but was not consistently used. Personally, I do not think it makes sense for `FullyShardedDataParallel` to expose this helper because it is not a common paradigm.

This PR is technically BC-breaking. However, I checked that no one internally is using this API.

cc @ezyang @gchanan
Pull Request resolved: pytorch#87480
Approved by: https://github.com/rohan-varma
@facebook-github-bot facebook-github-bot deleted the gh/awgu/140/head branch June 8, 2023 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: bc-breaking Related to a BC-breaking change release notes: distributed (fsdp) release notes category topic: bc breaking topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants