Skip to content

Use more parallelism in the QKV projections in MQA mode.#170

Merged
copybara-service[bot] merged 1 commit intogoogle:devfrom
szabadka:gemma2
Apr 30, 2024
Merged

Use more parallelism in the QKV projections in MQA mode.#170
copybara-service[bot] merged 1 commit intogoogle:devfrom
szabadka:gemma2

Conversation

@szabadka
Copy link
Collaborator

Instead of MatVecLoop, we use MatVec and we combine k and v into one 2 * kQKVDim long vector so that K and V projections can be combined into one MatVec operation.

Benchmark results (summarization with 1600 tokens for prefill and essay writing with 500 tokens for generation):

                   Prefill speed                Generation speed
Num threads      BEFORE       AFTER            BEFORE       AFTER
4                 9.81 t/s     9.96 t/s       8.39 t/s     8.46 t/s
18               31.50 t/s    36.67 t/s      23.10 t/s    25.83 t/s
32               45.36 t/s    58.91 t/s      27.60 t/s    31.25 t/s
64               57.72 t/s    80.64 t/s      35.40 t/s    39.76 t/s

Instead of MatVecLoop, we use MatVec and we combine k and v
into one 2 * kQKVDim long vector so that K and V projections
can be combined into one MatVec operation.

Benchmark results (summarization with 1600 tokens for prefill
and essay writing with 500 tokens for generation):

```
                   Prefill speed                Generation speed
Num threads      BEFORE       AFTER            BEFORE       AFTER
4                 9.81 t/s     9.96 t/s       8.39 t/s     8.46 t/s
18               31.50 t/s    36.67 t/s      23.10 t/s    25.83 t/s
32               45.36 t/s    58.91 t/s      27.60 t/s    31.25 t/s
64               57.72 t/s    80.64 t/s      35.40 t/s    39.76 t/s
```
Copy link
Member

@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@jan-wassenberg jan-wassenberg added the copybara-import Trigger Copybara for merging pull requests label Apr 30, 2024
@copybara-service copybara-service bot merged commit 374fd74 into google:dev Apr 30, 2024
@szabadka szabadka deleted the gemma2 branch April 30, 2024 14:46
@jan-wassenberg
Copy link
Member

FYI we are working on a fix for this change, it breaks 7B(MHA).

@jan-wassenberg
Copy link
Member

#172.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

copybara-import Trigger Copybara for merging pull requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants