Add first version of backpropagation support.#203
Add first version of backpropagation support.#203copybara-service[bot] merged 7 commits intogoogle:devfrom
Conversation
gemma/backward_scalar_test.cc
Outdated
There was a problem hiding this comment.
Consider seeding generators with hwy::Unpredictable1() from nanobenchmark.h, so that the outputs are not constexpr?
There was a problem hiding this comment.
Wouldn't that cause flakyness, I did it like this so the tests always execute the same thing.
There was a problem hiding this comment.
No worries, Unpredictable1 always returns 1, but in a way that the compiler does not know from the call site what the return value will be. Basically it's just not inline. So you can write 42*Unpredictable1 and get the same results as now, consistently.
gemma/gemma_test.cc
Outdated
There was a problem hiding this comment.
Should we revert these changes?
There was a problem hiding this comment.
I changed these to disabled so that ctest would skip them, since these need the external weights file. The test can still be run manually by adding --gtest_also_run_disabled_tests
There was a problem hiding this comment.
hm, not sure everyone will know that they have to run with also_run..
I understand you want to run only the cross-entropy tests sometimes. Maybe a good way forward is to move those into a separate test?
jan-wassenberg
left a comment
There was a problem hiding this comment.
Wow! That's a lot of new code :)
Consider moving the backward/forward files to a subdirectory?
This is still in progress / experimental, currently it is only implemented for normal gemma MQA attention layers, and no parallelism is added yet for backward pass. Since we need to remember all activations from all layers, the forward pass was also reimplemented with a new activation data structure.
Done. |
This is still in progress / experimental, currently it is only implemented for normal gemma MQA attention layers, and no parallelism is added yet for backward pass.
Since we need to remember all activations from all layers, the forward pass was also reimplemented with a new activation data structure.