Factor out deinterleaving of bf16 vectors for MatVecs. by samkaufman · Pull Request #166 · google/gemma.cpp

samkaufman · 2024-04-29T03:27:08Z

This specializes bf16-f32 and bf16-bf16 vector-matrix multiplications to first convert bf16 vectors into f32 buffers of vector-length strips of even- and odd-indexed values.

The 2B, bf16 model running on my Zen 1 machine sees ~10% throughput improvements to single-threaded prefill, single-threaded generation, and multi-threaded prefill, but only a marginal improvement to multi-threaded generation throughput.

This PR does not implement support for SFP.

TODOs:

Check for native BF16 support.
Lift out allocations of f32 vectors. (@jan-wassenberg volunteered to handle this.)
Clean up some MatVecAdd code duplication.

jan-wassenberg

Nice, great to see this coming together :) Thanks for sending the PR. Some sugestions:

compression/compress-inl.h

gemma/ops.h

jan-wassenberg · 2024-04-29T17:51:57Z

gemma/ops.h

+
+  const hn::ScalableTag<float> df;
+
+  const auto vec_dequant = hwy::AllocateAligned<float>(kInner);


Allocation can be quite slow, let's move this into gemma.cc's Activations. That would require plumbing through an extra tmp arg here, and the std::array storage should probably be the largest per-call size * max number of threads (say 128 or 256). Would you prefer if I made this change?

@jan-wassenberg Sure. Thanks for the help!

Done in #173, see even_odd :)

gemma/ops.h

samkaufman · 2024-04-29T22:47:52Z

@jan-wassenberg Thank you for reviewing! I'll branch on native BF16 support and clean up those near-duplicate MatVecAdd implementations, then turn off this PR's draft bit.

samkaufman · 2024-04-29T23:46:11Z

@jan-wassenberg One more question: what's the best way to check that the target doesn't have a native bf16 product/dot product support (e.g., AVX512_BF16)? You previously pointed me at a highway PR, but it looks like Copybara scrubbed the branch when the PR was dropped.

jan-wassenberg · 2024-04-30T09:14:11Z

@jan-wassenberg One more question: what's the best way to check that the target doesn't have a native bf16 product/dot product support (e.g., AVX512_BF16)? You previously pointed me at a highway PR, but it looks like Copybara scrubbed the branch when the PR was dropped.

We can check #if defined(HWY_NATIVE_DOT_BF16) && HWY_NATIVE_DOT_BF16 :)

compression/compress-inl.h

jan-wassenberg · 2024-04-30T09:18:28Z

gemma/ops.h

+// vector to even-odd layout.
+template <bool kAdd, size_t kOuter, size_t kInner, typename ArrayT,
+          typename VecT, typename AddT,
+          std::enable_if_t<


Consider replacing with HWY_IF_SAME2(VecT, float, hwy::bfloat16_t).

Also inline ProjQ and ProjKV lambdas, add missing includes/deps for ops_test. PiperOrigin-RevId: 629460608

samkaufman · 2024-04-30T22:33:16Z

@jan-wassenberg Done. Native bf16 checks added.

Additionally, 59ebecc fixes a bug I introduced in 6a78a23. That commit affected overload resolution such that the specialization was never called. That's now fixed by moving the bulk of MatVecAdd into detail::MatVecAddInner and between even-odd and linear layouts inside a constexpr. Using a constexpr ensures that it's all downstream of MatVecAdd's type inference.

samkaufman · 2024-04-30T22:36:26Z

I see even_odd storage is merged to dev. I'll merge.

jan-wassenberg

Nice, looks good to me, thanks for updating! Can you give it a quick sanity check also with sfp weights (e.g. those prefixed 1.1 on Kaggle) to make sure that also still works?

samkaufman · 2024-05-01T16:44:53Z

Can you give it a quick sanity check also with sfp weights (e.g. those prefixed 1.1 on Kaggle) to make sure that also still works?

Already did. Works great.

jan-wassenberg · 2024-05-01T17:58:56Z

Thanks for confirming :D

jan-wassenberg · 2024-05-02T07:01:31Z

Internal CI caught some unused vars:

third_party/gemma_cpp/gemma/ops.h:101:14: error: unused variable 'odd' [-Werror,-Wunused-variable]
  101 |   const auto odd = Set(du32, 0xFFFF0000u);
      |              ^~~
third_party/gemma_cpp/gemma/ops.h:364: error: unused variable 'df' [-Werror,-Wunused-variable]
  364 |   const hn::ScalableTag<float> df;
      |                                ^~
third_party/gemma_cpp/gemma/ops.h:366: error: unused variable 'kNumStrips' [-Werror,-Wunused-variable]
  366 |   constexpr size_t kNumStrips = kOuter / kRowsPerStrip;

Please fix :)

samkaufman

Oops. Hopefully that sorts it.

Remove extra Dot() overload MatVecAdd always adds, use MatVecT<kAdd> if conditional. Remove ununsed MatVecAddLoop and MatVecLoop No longer tsan-verify even_odd PiperOrigin-RevId: 631377279

Disable it for float32 because there is not enough benefit. PiperOrigin-RevId: 631788326

Even-odd layout MatVecs for bf16 weights.

0816a10

jan-wassenberg requested changes Apr 29, 2024

View reviewed changes

samkaufman added 3 commits April 29, 2024 12:51

supports_eo -> kSupportsEvenOdd

5cb6334

(VecT*) to static_cast<VecT*>.

aa0b113

Remove Bf16ToF32EO and use PromoteEvenTo and PromoteOddTo.

f608337

Abstracted some MatVecAdd spec. dupes.

6a78a23

jan-wassenberg mentioned this pull request Apr 30, 2024

Near-term roadmap #164

Open

67 tasks

jan-wassenberg reviewed Apr 30, 2024

View reviewed changes

compression/compress-inl.h Show resolved Hide resolved

jan-wassenberg reviewed Apr 30, 2024

View reviewed changes

copybara-service bot mentioned this pull request Apr 30, 2024

Add per-thread even_odd storage for #166. #173

Closed

copybara-service bot pushed a commit that referenced this pull request Apr 30, 2024

Add per-thread even_odd storage for #166.

12fb2f0

Also inline ProjQ and ProjKV lambdas, add missing includes/deps for ops_test. PiperOrigin-RevId: 629460608

samkaufman added 2 commits April 30, 2024 15:17

Fix: specialized MatVecAdd was never called.

59ebecc

Check for HWY_NATIVE_DOT_BF16.

2829ef1

Merge branch 'dev' into deinterleave-vecs

564937e

samkaufman marked this pull request as ready for review April 30, 2024 23:24

samkaufman requested a review from jan-wassenberg April 30, 2024 23:24

jan-wassenberg approved these changes May 1, 2024

View reviewed changes

jan-wassenberg added the copybara-import Trigger Copybara for merging pull requests label May 1, 2024

Remove unused vars.

4a6173d

samkaufman commented May 2, 2024

View reviewed changes

copybara-service bot merged commit 6eeef2e into google:dev May 3, 2024

copybara-service bot mentioned this pull request May 7, 2024

Fix RecurrentGemma (refs #166) - one Dot was ignoring scale. #179

Closed

copybara-service bot mentioned this pull request May 7, 2024

Enable even/odd for SFP. Refs #166 #180

Closed

copybara-service bot pushed a commit that referenced this pull request May 8, 2024

Enable even/odd for SFP. Refs #166

c5c9fc3

Disable it for float32 because there is not enough benefit. PiperOrigin-RevId: 631788326


		const hn::ScalableTag<float> df;

		const auto vec_dequant = hwy::AllocateAligned<float>(kInner);

Conversation

samkaufman commented Apr 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODOs:

Uh oh!

jan-wassenberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jan-wassenberg Apr 29, 2024

Choose a reason for hiding this comment

Uh oh!

samkaufman Apr 29, 2024

Choose a reason for hiding this comment

Uh oh!

jan-wassenberg Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

samkaufman commented Apr 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samkaufman commented Apr 29, 2024

Uh oh!

jan-wassenberg commented Apr 30, 2024

Uh oh!

Uh oh!

jan-wassenberg Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

samkaufman commented Apr 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samkaufman commented Apr 30, 2024

Uh oh!

jan-wassenberg left a comment

Choose a reason for hiding this comment

Uh oh!

samkaufman commented May 1, 2024

Uh oh!

jan-wassenberg commented May 1, 2024

Uh oh!

jan-wassenberg commented May 2, 2024

Uh oh!

samkaufman left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

samkaufman commented Apr 29, 2024 •

edited

Loading

samkaufman commented Apr 29, 2024 •

edited

Loading

samkaufman commented Apr 30, 2024 •

edited

Loading