Skip to content

Use more parallelism in the final output of the attention block.#175

Merged
copybara-service[bot] merged 1 commit intogoogle:devfrom
szabadka:gemma2
May 2, 2024
Merged

Use more parallelism in the final output of the attention block.#175
copybara-service[bot] merged 1 commit intogoogle:devfrom
szabadka:gemma2

Conversation

@szabadka
Copy link
Collaborator

@szabadka szabadka commented May 2, 2024

We use MatVec instead of MatVecLoop for the per-head dense layers, because we can parallelize more on the rows of the matrix than on the number of heads. This will be even more efficient after we rearrange the weights and can have a single MatVec operation.

Benchmark results (summarization with 1600 tokens for prefill and essay writing with 500 tokens for generation):

                   Prefill speed                Generation speed
Num threads      BEFORE       AFTER            BEFORE       AFTER
32               58.24 t/s    61.79 t/s      32.11 t/s    32.62 t/s
64               83.62 t/s    92.00 t/s      41.10 t/s    41.80 t/s

We use MatVec instead of MatVecLoop for the per-head dense layers,
because we can parallelize more on the rows of the matrix than
on the number of heads. This will be even more efficient after
we rearrange the weights and can have a single MatVec operation.

Benchmark results (summarization with 1600 tokens for prefill
and essay writing with 500 tokens for generation):

```
                   Prefill speed                Generation speed
Num threads      BEFORE       AFTER            BEFORE       AFTER
32               58.24 t/s    61.79 t/s      32.11 t/s    32.62 t/s
64               83.62 t/s    92.00 t/s      41.10 t/s    41.80 t/s
```
Copy link
Member

@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress, thanks!

@jan-wassenberg jan-wassenberg added the copybara-import Trigger Copybara for merging pull requests label May 2, 2024
@copybara-service copybara-service bot merged commit bafb838 into google:dev May 2, 2024
@szabadka szabadka deleted the gemma2 branch May 3, 2024 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

copybara-import Trigger Copybara for merging pull requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants