-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Include lr scaling info in adamw weight_decay docstring #154113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154113
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 21a5219 with merge base d7a83ab ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
janeyx99
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two concerns, but maybe it's okay.
- While I like having some mention of this over none at all, I'm not sure the doc reader will necessarily know what this means from just the phrase. It may be better to include a bigger
note::and include the mathematical difference like mentioned in the issue. - We should mention this in all the *AdamWs, NAdam + RAdam as they have decoupled_weight_decay options.
One could link to the comment of the original implementation issue, here. It explains the "problem" nicely. Would this be preferable over a description in the docstring?
I could add the note to all of them. This would make it even more desirable to link to an existing explanation compared to explaining the issue in detail in the docstring itself. Also, if #22343 would be implemented things would/could change again, just mentioning that |
|
#22343 will likely not be implemented any time soon. A description in the docstr is still preferred (as a digestible description) and I'm not sure we typically link to issues in docs (though I'm not theoretically opposed). You can define a docstr in optimizer.py and import it in all the relevant files for deduplication. |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Adds documentation as discussed in #38853