Skip to content

Fix: Quantized Gemma3 was missing Sliding Window attention and GELU activations#3326

Open
DrJesseGlass wants to merge 3 commits intohuggingface:mainfrom
DrJesseGlass:fix-quantized-gemma3
Open

Fix: Quantized Gemma3 was missing Sliding Window attention and GELU activations#3326
DrJesseGlass wants to merge 3 commits intohuggingface:mainfrom
DrJesseGlass:fix-quantized-gemma3

Conversation

@DrJesseGlass
Copy link
Contributor

@DrJesseGlass DrJesseGlass commented Jan 23, 2026

Changes quantized Gemma3 MLP activation from SiLU to GELU to match the model's
hidden_activation: gelu_pytorch_tanh config.

Issue #3299 pointed out strange behavior in quantized_gemma3 which was narrowed down to sliding window. The existing implementation of quantized_gemm3 did not have the sliding window implemented. So a design following gemma3's existing working sliding cache was copied to the quantized_gemma3. It has been verified working successfully.

Impact

For the SiLU to GELU we could see an immediate improvement in response for models that inherit from Gemma3, notably TranslateGemma. With SiLU, TranslateGemma produced partially untranslated output (English tokens leaking
through). With GELU, translations are fully in the target language.

Testing

Tested with TranslateGemma-4b-it (Q4_K_M quantization):

  • English → Swahili with SiLU: Mixed English/Swahili with untranslated phrases leaking through, mistranslations ("artificial intelligence" → "Kiwanda" [factory]), and incorrect word choices ("mtu" instead of "binadamu" for "human")
  • English → Swahili with GELU: Coherent fully-translated output with correct terminology ("siku zijazo" for "the future", "binadamu" for "human being")

@DrJesseGlass DrJesseGlass marked this pull request as ready for review January 23, 2026 20:55
@DrJesseGlass DrJesseGlass changed the title Fix: Use GELU activation in quantized Gemma3 (matches config) Fix: Quantized Gemma3 was missing Sliding Window attention and GELU activations Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant