Make `AdamW`, `NAdam` & `RAdam` differentiable #86183

emcastillo · 2022-10-04T03:03:10Z

Blocked by #86096

pytorch-bot · 2022-10-04T03:03:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86183

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 1 Pending

As of commit dc2140d:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2022-10-04T03:03:14Z

The committers listed above are authorized under a signed CLA.

✅ login: emcastillo (c107aa7, 339bdee, c19e5fc)

vadimkantorov · 2022-10-04T11:12:29Z

torch/optim/nadam.py

is item needed here in fact?

yeah .. addcdiv_ requires value to be a python scalar. I guess we could unify both execution paths to avoid device synchronization but not sure. cc @albanD (who also was concerned about this)

vadimkantorov · 2022-10-04T11:15:21Z

Maybe worth introducing an idiomatic utility clone_if(flag) lambda (clone_if = lambda tensor, flag: tensor.clone() if flag else tensor)? this seems a repeated idiom across optimizers, given keeping inplace ops (even if copied inside all optimizers) - although maybe better keep this verbose with if statements as it is now...

albanD

Wait is this included in the other PR? :p

emcastillo · 2022-10-06T05:52:02Z

separated all the PRs!

albanD

Sounds good to me !

vadimkantorov · 2022-10-12T08:44:32Z

Maybe worth introducing an idiomatic utility clone_if(flag) lambda (clone_if = lambda tensor, flag: tensor.clone() if flag else tensor)? this seems a repeated idiom across optimizers, given keeping inplace ops (even if copied inside all optimizers) - although maybe better keep this verbose with if statements as it is now...

Any thoughts about clone_if idiom? Or are explicit different if-statement paths better?

albanD · 2022-10-12T18:03:53Z

Any thoughts about clone_if idiom? Or are explicit different if-statement paths better?

I'm not sure? t = t.clone() if flag or t seems pretty clear and t = t.clone_if(flag) is not necessarily more readable?
Also we already have too many methods, I'm not sure we want simple helpers like this added.

Blocked by #86183 Pull Request resolved: #86258 Approved by: https://github.com/albanD

emcastillo · 2022-10-13T08:23:28Z

@pytorchbot merge

pytorchmergebot · 2022-10-13T08:25:00Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2022-10-13T08:25:09Z

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

pull

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

vadimkantorov · 2022-10-13T09:32:40Z

Any thoughts about clone_if idiom? Or are explicit different if-statement paths better?

I'm not sure? t = t.clone() if flag or t seems pretty clear and t = t.clone_if(flag) is not necessarily more readable? Also we already have too many methods, I'm not sure we want simple helpers like this added.

well, i meant that instead of if differentiable: if-else statements all over the optimizers that only change add_ to add etc. we would have a clone_if helper lambda shared across these optimizers and then the code would be clone_if(tensor, differentiable).add_ and keep only one code version. Since the lambda is trivial, it could be copied across the optimizer impls if wanting to keep their codes more independent

vadimkantorov · 2022-10-13T09:40:26Z

t = t.clone() if flag or t seems pretty clear and t = t.clone_if(flag) is not necessarily more readable?
Yeah, agree, but I mean somehow using this as idioum, for that a lambda clone_if may be better

Then instead of
if differentiable:
tensor = tensor.add(eps)
else:
tensor = tensor.add_(eps)

could be always written:
tensor = clone_if(tensor, differentiable).add_(eps) - although maybe this is less efficient?

Also, is there a functorch higher-order function wrappers that can act somehow torch.add(tensor, eps, inplace = not differentiable) to choose between torch.add and torch.add_?

albanD · 2022-10-13T13:57:39Z

although maybe this is less efficient?

Yes, we go twice over the memory, once to copy it and once to add.

Also, is there a functorch higher-order function wrappers that can act somehow torch.add(tensor, eps, inplace = not differentiable) to choose between torch.add and torch.add_?

Maybe? That would be a lot of work to add that to all APIs :p

vadimkantorov · 2022-10-13T14:02:32Z

actually, maybe not so many, not that many ops support proper inplace...

albanD · 2022-10-13T14:11:40Z

Also functorch really doesn't like inplace in general ;) So I don't think they will be happy with adding inplace kwarg haha
cc @zou3519

vadimkantorov · 2022-10-13T14:14:15Z

Well, maybe not in functorch :) in core

albanD · 2022-10-14T17:44:03Z

You can skip the dynamo errors.
The other issues seem relevant right?

emcastillo · 2022-10-17T04:30:22Z

@pytorchbot merge

pytorchmergebot · 2022-10-17T04:32:04Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

emcastillo added module: optimizer Related to torch.optim release notes: nn release notes category topic: improvements topic category labels Oct 4, 2022

emcastillo requested a review from albanD as a code owner October 4, 2022 03:03

pytorchbot added the open source label Oct 4, 2022

facebook-github-bot added the cla signed label Oct 4, 2022

vadimkantorov reviewed Oct 4, 2022

View reviewed changes

emcastillo mentioned this pull request Oct 5, 2022

Make ASGD & RProp differentiable #86258

Closed

albanD reviewed Oct 5, 2022

View reviewed changes

emcastillo force-pushed the differentiable-adamvars branch from e30bc50 to 83fe75e Compare October 6, 2022 05:42

albanD approved these changes Oct 6, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 6, 2022

emcastillo force-pushed the differentiable-adamvars branch from 83fe75e to c19e5fc Compare October 12, 2022 08:27

vadimkantorov mentioned this pull request Oct 12, 2022

Normalize handling of scalar arguments #66491

Open

pytorchmergebot pushed a commit that referenced this pull request Oct 13, 2022

Make ASGD & RProp differentiable (#86258)

cb4867a

Blocked by #86183 Pull Request resolved: #86258 Approved by: https://github.com/albanD

emcastillo force-pushed the differentiable-adamvars branch 2 times, most recently from 1e4ec86 to 8484ad3 Compare October 13, 2022 05:04

emcastillo force-pushed the differentiable-adamvars branch from 8484ad3 to b1f8705 Compare October 13, 2022 08:30

albanD approved these changes Oct 13, 2022

View reviewed changes

emcastillo force-pushed the differentiable-adamvars branch 2 times, most recently from 0148172 to 8f160b2 Compare October 14, 2022 00:42

Emilio Castillo added 3 commits October 16, 2022 23:20

Make AdamW, NAdam & RAdam differentiable

3e7971f

fix

11acd4d

Improve comment

3ff8cd2

emcastillo force-pushed the differentiable-adamvars branch from 8f160b2 to 3ff8cd2 Compare October 16, 2022 23:21

Use old code

dc2140d

pytorchmergebot added the Merged label Oct 17, 2022

pytorchmergebot closed this in 1b43883 Oct 17, 2022

comaniac mentioned this pull request Oct 31, 2022

TypeError: __init__() got an unexpected keyword argument 'differentiable' huggingface/accelerate#801

Closed

4 tasks

jeffhataws mentioned this pull request Feb 3, 2024

Misc bug fixes in Zero optimizer: handling differentiable argument, optimizer_dtype pytorch/xla#6454

Closed

Make AdamW, NAdam & RAdam differentiable #86183

Make AdamW, NAdam & RAdam differentiable #86183

Uh oh!

Conversation

emcastillo commented Oct 4, 2022

Uh oh!

pytorch-bot bot commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86183

✅ No Failures, 1 Pending

Uh oh!

linux-foundation-easycla bot commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadimkantorov Oct 4, 2022

Choose a reason for hiding this comment

Uh oh!

emcastillo Oct 6, 2022

Choose a reason for hiding this comment

Uh oh!

vadimkantorov commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

emcastillo commented Oct 6, 2022

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

vadimkantorov commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albanD commented Oct 12, 2022

Uh oh!

emcastillo commented Oct 13, 2022

Uh oh!

pytorchmergebot commented Oct 13, 2022

Merge started

Uh oh!

pytorchmergebot commented Oct 13, 2022

Merge failed

Uh oh!

vadimkantorov commented Oct 13, 2022

Uh oh!

vadimkantorov commented Oct 13, 2022

Uh oh!

albanD commented Oct 13, 2022

Uh oh!

vadimkantorov commented Oct 13, 2022

Uh oh!

albanD commented Oct 13, 2022

Uh oh!

vadimkantorov commented Oct 13, 2022

Uh oh!

albanD commented Oct 14, 2022

Uh oh!

emcastillo commented Oct 17, 2022

Uh oh!

pytorchmergebot commented Oct 17, 2022

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Make `AdamW`, `NAdam` & `RAdam` differentiable #86183

Make `AdamW`, `NAdam` & `RAdam` differentiable #86183

pytorch-bot bot commented Oct 4, 2022 •

edited

Loading

linux-foundation-easycla bot commented Oct 4, 2022 •

edited

Loading

vadimkantorov commented Oct 4, 2022 •

edited

Loading

vadimkantorov commented Oct 12, 2022 •

edited

Loading