Skip to content

Add first version of backpropagation support.#203

Merged
copybara-service[bot] merged 7 commits intogoogle:devfrom
szabadka:backprop5
Jun 4, 2024
Merged

Add first version of backpropagation support.#203
copybara-service[bot] merged 7 commits intogoogle:devfrom
szabadka:backprop5

Conversation

@szabadka
Copy link
Collaborator

@szabadka szabadka commented Jun 3, 2024

This is still in progress / experimental, currently it is only implemented for normal gemma MQA attention layers, and no parallelism is added yet for backward pass.

Since we need to remember all activations from all layers, the forward pass was also reimplemented with a new activation data structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider seeding generators with hwy::Unpredictable1() from nanobenchmark.h, so that the outputs are not constexpr?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that cause flakyness, I did it like this so the tests always execute the same thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, Unpredictable1 always returns 1, but in a way that the compiler does not know from the call site what the return value will be. Basically it's just not inline. So you can write 42*Unpredictable1 and get the same results as now, consistently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we revert these changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed these to disabled so that ctest would skip them, since these need the external weights file. The test can still be run manually by adding --gtest_also_run_disabled_tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, not sure everyone will know that they have to run with also_run..
I understand you want to run only the cross-entropy tests sometimes. Maybe a good way forward is to move those into a separate test?

Copy link
Member

@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! That's a lot of new code :)
Consider moving the backward/forward files to a subdirectory?

szabadka added 5 commits June 4, 2024 08:37
This is still in progress / experimental, currently it is only
implemented for normal gemma MQA attention layers, and no
parallelism is added yet for backward pass.

Since we need to remember all activations from all layers, the
forward pass was also reimplemented with a new activation data
structure.
@szabadka
Copy link
Collaborator Author

szabadka commented Jun 4, 2024

Wow! That's a lot of new code :) Consider moving the backward/forward files to a subdirectory?

Done.

@szabadka szabadka requested a review from jan-wassenberg June 4, 2024 11:47
@jan-wassenberg jan-wassenberg added the copybara-import Trigger Copybara for merging pull requests label Jun 4, 2024
@copybara-service copybara-service bot merged commit 25d9c8f into google:dev Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

copybara-import Trigger Copybara for merging pull requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants