Add Adam optimizer. by szabadka · Pull Request #212 · google/gemma.cpp

szabadka · 2024-06-06T16:29:11Z

Drive-by: Fix compilation errors and tests for backprop functions.

gemma/common.h

gemma/configs.h

jan-wassenberg · 2024-06-06T16:46:35Z

gemma/gemma.cc

Isn't the default ctor (in the header) enough to get us a null impl_?

There was a compilation error because the unique ptr did not know that size of the forward declared class, so I had to add something to the cc file.

Oh, I see. You are right, all the special functions indeed need to be in the .cc after the definition of Impl. I was confused by the body of the ctor here - that is unnecessary, we can just write GemmaTokenizer::GemmaTokenizer() = default. I'll add to my TODO.

jan-wassenberg · 2024-06-06T16:49:35Z

gemma/gemma.cc

hm, I'm not sure we actually want to encourage f32 weights. For training, wouldn't it make sense to use bf16 weights? Those are considered compressed, though we'd have to build with -DGEMMA_WEIGHT_T=hwy::bfloat16_t. That should be faster, and let us only have a single function here. And maybe we could even get rid of kWeightsAreCompressed?

Note that 'compressed' can also mean f32. It would be nice to get rid of the duplicated non-compressed code now that we have the separate compress_weights binary.

I removed these for now, since training works if I change kWeightsAreCompressed to false.

jan-wassenberg · 2024-06-06T16:55:36Z

gemma/gemma.cc

It would be nice to avoid this duplication. It seems that you want to use f32 (if not bf16, see above) weights for the backprop. What prevents us from doing that with kWeightsAreCompressed=true, and setting GEMMA_WEIGHT_T to float?

for now I reverted this part, but still keeping kWeightsAreCompressed

gemma/gemma.h

gemma/weights.h

Drive-by: Fix compilation errors and tests for backprop functions.

szabadka requested a review from jan-wassenberg June 6, 2024 16:29

jan-wassenberg requested changes Jun 6, 2024

View reviewed changes

Add Adam optimizer.

c004799

Drive-by: Fix compilation errors and tests for backprop functions.

szabadka force-pushed the adam2 branch from 63bd8bf to c004799 Compare June 6, 2024 18:42

szabadka requested a review from jan-wassenberg June 6, 2024 18:46

jan-wassenberg approved these changes Jun 7, 2024

View reviewed changes

jan-wassenberg added the copybara-import Trigger Copybara for merging pull requests label Jun 7, 2024

copybara-service bot merged commit f7ac709 into google:dev Jun 7, 2024

szabadka deleted the adam2 branch June 7, 2024 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Adam optimizer.#212

Add Adam optimizer.#212
copybara-service[bot] merged 1 commit intogoogle:devfrom
szabadka:adam2

szabadka commented Jun 6, 2024

Uh oh!

Uh oh!

Uh oh!

jan-wassenberg Jun 6, 2024

Uh oh!

szabadka Jun 6, 2024

Uh oh!

jan-wassenberg Jun 7, 2024

Uh oh!

jan-wassenberg Jun 6, 2024

Uh oh!

szabadka Jun 6, 2024

Uh oh!

jan-wassenberg Jun 6, 2024

Uh oh!

szabadka Jun 6, 2024

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

szabadka commented Jun 6, 2024

Uh oh!

Uh oh!

Uh oh!

jan-wassenberg Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

szabadka Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

jan-wassenberg Jun 7, 2024

Choose a reason for hiding this comment

Uh oh!

jan-wassenberg Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

szabadka Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

jan-wassenberg Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

szabadka Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants