-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[optimizer] refactor Adam to use functional API #44791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit c3a26d6 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 51 times. |
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
|
@egrefen @seba-1511 -- since you were part of the #39279 discussion, do you have any feedback on the current functional implementation? |
[ghstack-poisoned]
albanD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Just the init that should be moved back to be lazy.
torch/optim/adam.py
Outdated
| state = self.state[p] | ||
| state['step'] = 0 | ||
| # Exponential moving average of gradient values | ||
| state['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to move these to the init.
Some new features like this actually relies on the fact that our optimizers lazily initialize their buffers.
torch/optim/functional.py
Outdated
| bias_correction2 = 1 - beta2 ** step | ||
|
|
||
| if weight_decay != 0: | ||
| grad = grad.add(p, alpha=weight_decay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p or param? it looks like this line doesn't fall under any current tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like the test_adam in test_optim didn't cover the weight_decay case, added a test case to cover that.
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Codecov Report
@@ Coverage Diff @@
## gh/wanchaol/131/base #44791 +/- ##
========================================================
+ Coverage 68.11% 68.13% +0.01%
========================================================
Files 394 394
Lines 50957 50976 +19
========================================================
+ Hits 34710 34730 +20
+ Misses 16247 16246 -1
Continue to review full report at Codecov.
|
vincentqb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Stack from ghstack:
Differential Revision: D23935257