Skip to content

Conversation

@janeyx99
Copy link
Contributor

@janeyx99 janeyx99 commented Mar 1, 2023

Big OOP correction continued. Also added a test this time to verify the defaulting was as expected.

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that all tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach can handle cpu tensors, this should not introduce breakage.

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 1, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/95820

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d9f191b:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: nn release notes category label Mar 1, 2023
@janeyx99 janeyx99 added ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor labels Mar 1, 2023
Big OOP correction continued. Also added a test this time to verify the defaulting was as expected. 

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.




[ghstack-poisoned]
Big OOP correction continued. Also added a test this time to verify the defaulting was as expected. 

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.




[ghstack-poisoned]
Big OOP correction continued. Also added a test this time to verify the defaulting was as expected. 

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.




[ghstack-poisoned]
janeyx99 added a commit that referenced this pull request Mar 1, 2023
ghstack-source-id: 60a53ba
Pull Request resolved: #95820
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!
Did the doc PR landed? Does the phrasing on when we use foreach by default need to be changed?

@janeyx99
Copy link
Contributor Author

janeyx99 commented Mar 2, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@janeyx99
Copy link
Contributor Author

janeyx99 commented Mar 2, 2023

Sounds good! Did the doc PR landed? Does the phrasing on when we use foreach by default need to be changed?

Doc PR landed last week--phrasing could be minorly updated from "tensors" to "parameters" but not really significant.

janeyx99 added a commit to janeyx99/pytorch that referenced this pull request Mar 2, 2023
Big OOP correction continued. Also added a test this time to verify the defaulting was as expected.

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.

Pull Request resolved: pytorch#95820
Approved by: https://github.com/albanD
atalman pushed a commit that referenced this pull request Mar 2, 2023
* [optim] include nn.Parameter as foreach supported (#95811)

This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS.

Pull Request resolved: #95811
Approved by: https://github.com/albanD

* [optim] Widen the cases for defaulting to foreach (#95820)

Big OOP correction continued. Also added a test this time to verify the defaulting was as expected.

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.

Pull Request resolved: #95820
Approved by: https://github.com/albanD
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 5, 2023
Big OOP correction continued. Also added a test this time to verify the defaulting was as expected.

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.

Pull Request resolved: pytorch/pytorch#95820
Approved by: https://github.com/albanD
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 5, 2023
Big OOP correction continued. Also added a test this time to verify the defaulting was as expected.

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.

Pull Request resolved: pytorch/pytorch#95820
Approved by: https://github.com/albanD
pruthvistony pushed a commit to ROCm/pytorch that referenced this pull request May 3, 2023
* [optim] include nn.Parameter as foreach supported (pytorch#95811)

This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS.

Pull Request resolved: pytorch#95811
Approved by: https://github.com/albanD

* [optim] Widen the cases for defaulting to foreach (pytorch#95820)

Big OOP correction continued. Also added a test this time to verify the defaulting was as expected.

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.

Pull Request resolved: pytorch#95820
Approved by: https://github.com/albanD
@facebook-github-bot facebook-github-bot deleted the gh/janeyx99/34/head branch June 8, 2023 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: nn release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants