Conversation
gemma/gemma.cc
Outdated
There was a problem hiding this comment.
Isn't the default ctor (in the header) enough to get us a null impl_?
There was a problem hiding this comment.
There was a compilation error because the unique ptr did not know that size of the forward declared class, so I had to add something to the cc file.
There was a problem hiding this comment.
Oh, I see. You are right, all the special functions indeed need to be in the .cc after the definition of Impl. I was confused by the body of the ctor here - that is unnecessary, we can just write GemmaTokenizer::GemmaTokenizer() = default. I'll add to my TODO.
gemma/gemma.cc
Outdated
There was a problem hiding this comment.
hm, I'm not sure we actually want to encourage f32 weights. For training, wouldn't it make sense to use bf16 weights? Those are considered compressed, though we'd have to build with -DGEMMA_WEIGHT_T=hwy::bfloat16_t. That should be faster, and let us only have a single function here. And maybe we could even get rid of kWeightsAreCompressed?
Note that 'compressed' can also mean f32. It would be nice to get rid of the duplicated non-compressed code now that we have the separate compress_weights binary.
There was a problem hiding this comment.
I removed these for now, since training works if I change kWeightsAreCompressed to false.
gemma/gemma.cc
Outdated
There was a problem hiding this comment.
It would be nice to avoid this duplication. It seems that you want to use f32 (if not bf16, see above) weights for the backprop. What prevents us from doing that with kWeightsAreCompressed=true, and setting GEMMA_WEIGHT_T to float?
There was a problem hiding this comment.
for now I reverted this part, but still keeping kWeightsAreCompressed
Drive-by: Fix compilation errors and tests for backprop functions.
Drive-by: Fix compilation errors and tests for backprop functions.