Use more parallelism in the QKV projections of the MHA block. by szabadka · Pull Request #176 · google/gemma.cpp

szabadka · 2024-05-02T14:11:06Z

We compute all three projections with one MatVec and then copy the kv part to the cache.

Benchmark results for 7b-it model that uses MHA blocks (summarization with 1600 tokens for prefill and essay writing with 500 tokens for generation):

                   Prefill speed                Generation speed
Num threads      BEFORE       AFTER            BEFORE       AFTER
32               13.75 t/s    14.80 t/s       9.22 t/s     9.77 t/s
64               19.89 t/s    24.83 t/s      12.46 t/s    13.66 t/s

We compute all three projections with one MatVec and then copy the kv part to the cache. Benchmark results for 7b-it model that uses MHA blocks (summarization with 1600 tokens for prefill and essay writing with 500 tokens for generation): ``` Prefill speed Generation speed Num threads BEFORE AFTER BEFORE AFTER 32 13.75 t/s 14.80 t/s 9.22 t/s 9.77 t/s 64 19.89 t/s 24.83 t/s 12.46 t/s 13.66 t/s ```

jan-wassenberg

Great to get rid of MatVecLoop! Just to confirm that the MQA 2B still works?

gemma/gemma.cc

szabadka · 2024-05-02T16:25:46Z

Great to get rid of MatVecLoop! Just to confirm that the MQA 2B still works?

Yes, I tested that it still works.

jan-wassenberg reviewed May 2, 2024

View reviewed changes

gemma/gemma.cc Show resolved Hide resolved

jan-wassenberg added the copybara-import Trigger Copybara for merging pull requests label May 2, 2024

copybara-service bot merged commit 2a71333 into google:dev May 2, 2024

szabadka deleted the gemma3 branch May 3, 2024 06:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use more parallelism in the QKV projections of the MHA block.#176

Use more parallelism in the QKV projections of the MHA block.#176
copybara-service[bot] merged 1 commit intogoogle:devfrom
szabadka:gemma3

szabadka commented May 2, 2024

Uh oh!

jan-wassenberg left a comment

Uh oh!

Uh oh!

szabadka commented May 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

szabadka commented May 2, 2024

Uh oh!

jan-wassenberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

szabadka commented May 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants