Skip to content

Conversation

@wanchaol
Copy link
Collaborator

@wanchaol wanchaol commented Sep 16, 2020

Stack from ghstack:

Differential Revision: D23935257

@dr-ci
Copy link

dr-ci bot commented Sep 16, 2020

💊 CI failures summary and remediations

As of commit c3a26d6 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 51 times.

wanchaol added a commit that referenced this pull request Sep 16, 2020
ghstack-source-id: 03d1191
Pull Request resolved: #44791
@vincentqb
Copy link
Contributor

@egrefen @seba-1511 -- since you were part of the #39279 discussion, do you have any feedback on the current functional implementation?

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Just the init that should be moved back to be lazy.

state = self.state[p]
state['step'] = 0
# Exponential moving average of gradient values
state['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to move these to the init.
Some new features like this actually relies on the fact that our optimizers lazily initialize their buffers.

bias_correction2 = 1 - beta2 ** step

if weight_decay != 0:
grad = grad.add(p, alpha=weight_decay)
Copy link
Contributor

@vincentqb vincentqb Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p or param? it looks like this line doesn't fall under any current tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like the test_adam in test_optim didn't cover the weight_decay case, added a test case to cover that.

@codecov
Copy link

codecov bot commented Sep 25, 2020

Codecov Report

Merging #44791 into gh/wanchaol/131/base will increase coverage by 0.01%.
The diff coverage is 98.03%.

Impacted file tree graph

@@                   Coverage Diff                    @@
##           gh/wanchaol/131/base   #44791      +/-   ##
========================================================
+ Coverage                 68.11%   68.13%   +0.01%     
========================================================
  Files                       394      394              
  Lines                     50957    50976      +19     
========================================================
+ Hits                      34710    34730      +20     
+ Misses                    16247    16246       -1     
Impacted Files Coverage Δ
torch/optim/adam.py 92.98% <96.55%> (+1.31%) ⬆️
torch/optim/functional.py 91.66% <100.00%> (+7.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3c8b9eb...c3a26d6. Read the comment docs.

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@facebook-github-bot
Copy link
Contributor

@wanchaol merged this pull request in 08caf15.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants