Skip to content

Use more parallelism in the QKV projections of the MHA block.#176

Merged
copybara-service[bot] merged 1 commit intogoogle:devfrom
szabadka:gemma3
May 2, 2024
Merged

Use more parallelism in the QKV projections of the MHA block.#176
copybara-service[bot] merged 1 commit intogoogle:devfrom
szabadka:gemma3

Conversation

@szabadka
Copy link
Collaborator

@szabadka szabadka commented May 2, 2024

We compute all three projections with one MatVec and then copy the kv part to the cache.

Benchmark results for 7b-it model that uses MHA blocks (summarization with 1600 tokens for prefill and essay writing with 500 tokens for generation):

                   Prefill speed                Generation speed
Num threads      BEFORE       AFTER            BEFORE       AFTER
32               13.75 t/s    14.80 t/s       9.22 t/s     9.77 t/s
64               19.89 t/s    24.83 t/s      12.46 t/s    13.66 t/s

We compute all three projections with one MatVec and then copy
the kv part to the cache.

Benchmark results for 7b-it model that uses MHA blocks (summarization with
1600 tokens for prefill and essay writing with 500 tokens for generation):

```
                   Prefill speed                Generation speed
Num threads      BEFORE       AFTER            BEFORE       AFTER
32               13.75 t/s    14.80 t/s       9.22 t/s     9.77 t/s
64               19.89 t/s    24.83 t/s      12.46 t/s    13.66 t/s
```
Copy link
Member

@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to get rid of MatVecLoop! Just to confirm that the MQA 2B still works?

@szabadka
Copy link
Collaborator Author

szabadka commented May 2, 2024

Great to get rid of MatVecLoop! Just to confirm that the MQA 2B still works?

Yes, I tested that it still works.

@jan-wassenberg jan-wassenberg added the copybara-import Trigger Copybara for merging pull requests label May 2, 2024
@copybara-service copybara-service bot merged commit 2a71333 into google:dev May 2, 2024
@szabadka szabadka deleted the gemma3 branch May 3, 2024 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

copybara-import Trigger Copybara for merging pull requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants