Skip to content

Use more parallelism in attention block in prefill mode.#177

Merged
copybara-service[bot] merged 3 commits intogoogle:devfrom
szabadka:gemma2
May 3, 2024
Merged

Use more parallelism in attention block in prefill mode.#177
copybara-service[bot] merged 3 commits intogoogle:devfrom
szabadka:gemma2

Conversation

@szabadka
Copy link
Collaborator

@szabadka szabadka commented May 3, 2024

Move the loop over the tokens inside the attention block and then create kHeads * num_tokens threads.

This helps the multi-threaded speed only in case of the 2b gemma model, but to be consistent we move the loop over the tokens inside the griffin recurrent layer and the FFW layer as well. This is also a preparation for using the MatMul operation later.

Benchmark results (summarization with 1600 tokens for prefill and essay writing with 500 tokens for generation):

                   Prefill speed
Num threads      BEFORE       AFTER
32               61.76 t/s    65.08 t/s
64               89.46 t/s    98.62 t/s

Move the loop over the tokens inside the attention block and
then create kHeads * num_tokens threads.

This helps the multi-threaded speed only in case of the 2b gemma
model, but to be consistent we move the loop over the tokens inside
the griffin recurrent layer and the FFW layer as well. This is
also a preparation for using the MatMul operation later.

Benchmark results (summarization with 1600 tokens for prefill
and essay writing with 500 tokens for generation):

```
                   Prefill speed
Num threads      BEFORE       AFTER
32               61.76 t/s    65.08 t/s
64               89.46 t/s    98.62 t/s
```
Copy link
Member

@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, loop + MatVec is starting to look a lot like a matmul!
One small fix and a question:

@jan-wassenberg jan-wassenberg added the copybara-import Trigger Copybara for merging pull requests label May 3, 2024
@copybara-service copybara-service bot merged commit 8ed22e5 into google:dev May 3, 2024
@szabadka szabadka deleted the gemma2 branch May 3, 2024 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

copybara-import Trigger Copybara for merging pull requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants