Use more parallelism in the final output of the attention block. by szabadka · Pull Request #175 · google/gemma.cpp

szabadka · 2024-05-02T09:50:01Z

We use MatVec instead of MatVecLoop for the per-head dense layers, because we can parallelize more on the rows of the matrix than on the number of heads. This will be even more efficient after we rearrange the weights and can have a single MatVec operation.

Benchmark results (summarization with 1600 tokens for prefill and essay writing with 500 tokens for generation):

                   Prefill speed                Generation speed
Num threads      BEFORE       AFTER            BEFORE       AFTER
32               58.24 t/s    61.79 t/s      32.11 t/s    32.62 t/s
64               83.62 t/s    92.00 t/s      41.10 t/s    41.80 t/s

We use MatVec instead of MatVecLoop for the per-head dense layers, because we can parallelize more on the rows of the matrix than on the number of heads. This will be even more efficient after we rearrange the weights and can have a single MatVec operation. Benchmark results (summarization with 1600 tokens for prefill and essay writing with 500 tokens for generation): ``` Prefill speed Generation speed Num threads BEFORE AFTER BEFORE AFTER 32 58.24 t/s 61.79 t/s 32.11 t/s 32.62 t/s 64 83.62 t/s 92.00 t/s 41.10 t/s 41.80 t/s ```

jan-wassenberg

Nice progress, thanks!

jan-wassenberg approved these changes May 2, 2024

View reviewed changes

jan-wassenberg added the copybara-import Trigger Copybara for merging pull requests label May 2, 2024

copybara-service bot merged commit bafb838 into google:dev May 2, 2024

szabadka deleted the gemma2 branch May 3, 2024 06:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use more parallelism in the final output of the attention block.#175

Use more parallelism in the final output of the attention block.#175
copybara-service[bot] merged 1 commit intogoogle:devfrom
szabadka:gemma2

szabadka commented May 2, 2024

Uh oh!

jan-wassenberg left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

szabadka commented May 2, 2024

Uh oh!

jan-wassenberg left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants