Skip to content

Conversation

@farhadrgh
Copy link
Contributor

This PR fixes #41477

Adam implementation is doing L2 regularization and not decoupled weight decay. However, the change mentioned in #41477 was motivated by Line 12 of algorithm 2 in Decoupled Weight Decay Regularization paper.

Please let me know if you have other suggestions about how to deliver this info in the docs.
cc @ezyang

@gchanan gchanan requested review from ezyang and vincentqb July 20, 2020 23:45
@gchanan gchanan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 20, 2020
Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentqb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@vincentqb merged this pull request in 4b4273a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adam implementation different from paper

6 participants