[optim][adamw] default to foreach when CUDA + differentiable=False #92306

janeyx99 · 2023-01-17T15:30:03Z

Stack from ghstack (oldest at bottom):

BC-breaking note

Algorithms `{Adadelta, Adagrad, Adam, Adamax, AdamW, ASGD, NAdam, RAdam, RMSProp, RProp, SGD}` default to faster `foreach` implementation when on CUDA + differentiable=`False`

When applicable, this changes the default behavior of step() and anything that calls into adadelta(...), adagrad(...), adam(...), adamax(...), adamw(...), asgd(...), nadam(...), radam(...), rmsprop(...), rprop(...), sgd(...) directly to use the foreach implementation instead of the for-loop for better performance. Applicable means

the user has not specified kwargs relating to implementation (foreach, fused, or differentiable),
all tensors are native tensors (not subclasses) and on CUDA,
torch.jit.is_scripting is False.

When these conditions are satisfied, the implementation used will match the implementation used when one passes foreach=True. The user defined flag for foreach will NOT be overwritten in order to preserve user selections. For more details, check the documentation. There should be no significant differences between the results returned by these optimizers. To revert to the old behavior, say, for adam, pass in adam(..., foreach=False, ...) or initialize Adam with Adam(..., foreach=False, ...).

[ghstack-poisoned]

pytorch-bot · 2023-01-17T15:30:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92306

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b78c9b2:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

albanD

SGTM

torch/optim/adamw.py

…le=False" cc ezyang gchanan [ghstack-poisoned]

ghstack-source-id: fc22c8b Pull Request resolved: #92306

janeyx99 · 2023-01-17T20:04:29Z

@pytorchbot merge

pytorchmergebot · 2023-01-17T20:06:07Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[optim][adamw] default to foreach when CUDA + differentiable=False

3bc8644

[ghstack-poisoned]

janeyx99 requested a review from albanD as a code owner January 17, 2023 15:30

janeyx99 mentioned this pull request Jan 17, 2023

[optim] abstract out _default_to_foreach_util #92305

Closed

janeyx99 added module: bc-breaking Related to a BC-breaking change release notes: nn release notes category labels Jan 17, 2023

pytorch-bot bot added topic: bc breaking topic category labels Jan 17, 2023

albanD approved these changes Jan 17, 2023

View reviewed changes

torch/optim/adamw.py Outdated Show resolved Hide resolved

Update on "[optim][adamw] default to foreach when CUDA + differentiab…

125537e

…le=False" cc ezyang gchanan [ghstack-poisoned]

Update on "[optim][adamw] default to foreach when CUDA + differentiab…

b78c9b2

…le=False" cc ezyang gchanan [ghstack-poisoned]

janeyx99 added a commit that referenced this pull request Jan 17, 2023

[optim][adamw] default to foreach when CUDA + differentiable=False

d490eb6

ghstack-source-id: fc22c8b Pull Request resolved: #92306

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 17, 2023

pytorchmergebot added the Merged label Jan 18, 2023

pytorchmergebot closed this in 0157e2e Jan 18, 2023

facebook-github-bot deleted the gh/janeyx99/8/head branch June 8, 2023 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[optim][adamw] default to foreach when CUDA + differentiable=False #92306

[optim][adamw] default to foreach when CUDA + differentiable=False #92306

Uh oh!

janeyx99 commented Jan 17, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 17, 2023 •

edited

Loading

Uh oh!

albanD left a comment

Uh oh!

Uh oh!

janeyx99 commented Jan 17, 2023

Uh oh!

pytorchmergebot commented Jan 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[optim][adamw] default to foreach when CUDA + differentiable=False #92306

[optim][adamw] default to foreach when CUDA + differentiable=False #92306

Uh oh!

Conversation

janeyx99 commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

BC-breaking note

Algorithms {Adadelta, Adagrad, Adam, Adamax, AdamW, ASGD, NAdam, RAdam, RMSProp, RProp, SGD} default to faster foreach implementation when on CUDA + differentiable=False

Uh oh!

pytorch-bot bot commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92306

✅ No Failures

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

janeyx99 commented Jan 17, 2023

Uh oh!

pytorchmergebot commented Jan 17, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

janeyx99 commented Jan 17, 2023 •

edited

Loading

Algorithms `{Adadelta, Adagrad, Adam, Adamax, AdamW, ASGD, NAdam, RAdam, RMSProp, RProp, SGD}` default to faster `foreach` implementation when on CUDA + differentiable=`False`

pytorch-bot bot commented Jan 17, 2023 •

edited

Loading