Fix: Quantized Gemma3 was missing Sliding Window attention and GELU activations#3326
Open
DrJesseGlass wants to merge 3 commits intohuggingface:mainfrom
Open
Fix: Quantized Gemma3 was missing Sliding Window attention and GELU activations#3326DrJesseGlass wants to merge 3 commits intohuggingface:mainfrom
DrJesseGlass wants to merge 3 commits intohuggingface:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes quantized Gemma3 MLP activation from SiLU to GELU to match the model's
hidden_activation: gelu_pytorch_tanhconfig.Issue #3299 pointed out strange behavior in quantized_gemma3 which was narrowed down to sliding window. The existing implementation of quantized_gemm3 did not have the sliding window implemented. So a design following gemma3's existing working sliding cache was copied to the quantized_gemma3. It has been verified working successfully.
Impact
For the SiLU to GELU we could see an immediate improvement in response for models that inherit from Gemma3, notably TranslateGemma. With SiLU, TranslateGemma produced partially untranslated output (English tokens leaking
through). With GELU, translations are fully in the target language.
Testing
Tested with TranslateGemma-4b-it (Q4_K_M quantization):