[training] use the lr when using 8bit adam.#9796
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@linoytsaban sorry I think I went too soon with the PR. I realized that we ALWAYS pack "lr" in So, I think you were right not to pass the Cc: @a-r-r-o-w for the changed Cog scripts. |
a-r-r-o-w
left a comment
There was a problem hiding this comment.
Looks good! IIUC, since LR is already packed with parameters, no need to pass it to optimizer_class when creating it - this inconsistency was there in some of our scripts?
|
Correct. And this PR resolves that. |
* use the lr when using 8bit adam. * remove lr as we pack it in params_to_optimize. --------- Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
* use the lr when using 8bit adam. * remove lr as we pack it in params_to_optimize. --------- Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
What does this PR do?
Oops.
@a-r-r-o-w FYI.