Fix GLM-4.5V-FP8 numerical issue #22949

zixi-qi · 2025-08-15T01:17:26Z

Purpose

Fix numerical regression when running GLM-4.5V-FP8 with vLLM. Issue is that for vision attention qkv projection, the model config file want to turn off quantization by adding "visual.blocks.1.attn.qkv_proj" to https://huggingface.co/zai-org/GLM-4.5V-FP8/blob/main/config.json . However at runtime, the prefix is actually called "visual.blocks.1.attn.qkv" instead.

This PR does a short term fix to change the prefix so that quantization can be turned off for those layers

Test Plan

CUDA_VISIBLE_DEVICES=4,5,6,7 python examples/offline_inference/glm4_5v.py --num_prompts 3 --model glm45-fp8 --mm

Test Result

Before the fix

Adding requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:05<00:00,  1.80s/it]
Processed prompts: 100%|█████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.60it/s, est. speed input: 9574.60 toks/s, output: 10.14 toks/s]
Generated text: ', '
Generated text: ', , the., the '
Generated text: ',  connected by, '

After the fix

Adding requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.62s/it]
Processed prompts: 100%|█████████████████████████████████████████████████████████████████| 3/3 [00:14<00:00,  4.83s/it, est. speed input: 1239.39 toks/s, output: 50.00 toks/s]
Generated text: "<think>The user is asking to describe the content of the image. First, I need to look at the image provided. The image shows a duck swimming in water. Let me observe the details: the duck has a green head, a yellow beak, and its body is a mix of brown, white, and black. The water is blue with ripples, and there's a reflection of the duck in the water. I should describe the duck's appearance, its position in the water, and the surrounding environment. Make sure to mention the colors and any notable features like the reflection. Also, check if there are any other elements, but it seems the main focus is the duck. So the content is a duck swimming in water with its reflection visible.</think>\nThe image shows a duck swimming in a body of water. The duck has a green iridescent head, a bright yellow beak, and a body with brown, white, and black feathers. The water is a deep blue with gentle ripples, and the duck's reflection is visible on the surface. The overall scene depicts a calm aquatic environment with the duck as the central subject.\nThe answer is <|begin_of_box|>A duck swimming in water with its reflection visible.<|end_of_box|>"
Generated text: "<think>The user is asking to describe the content of the image. First, I need to look at the image carefully. The image shows a lion in a grassy field. The lion has a prominent mane, which is a mix of brown and darker colors. The background is a field of tall, dry grass with some green patches, indicating a savanna or grassland environment. The lion is sitting or standing in the grass, looking directly at the camera with a calm expression. I should describe the lion's features, the environment, and any other notable details. Make sure to mention the mane, the grass, and the overall setting. Also, check if there are any other elements, but it seems the main focus is the lion in the grassland.</think>\nThe image depicts a majestic lion with a thick, dark brown and black mane, sitting in a field of tall, dry grass. The lion is facing the camera, with a calm and alert expression. The background consists of a vast expanse of golden-brown grass, with some green patches, suggesting a savanna or grassland habitat. The lighting appears natural, likely from sunlight, highlighting the lion's features and the texture of the grass."
Generated text: "<think>The user is asking to describe the content of the image. First, I need to look at the image carefully. The image shows a bird perched on a tree branch. The bird has a blue head and back, with a white underpart. It's holding something in its beak, which looks like a nest material or maybe an insect. The background is blurred, showing green foliage and tree trunks, indicating a natural outdoor setting. I should describe the bird's appearance, its position, and the surroundings. Make sure to mention the colors, the object in its beak, and the environment.</think>\nThe image shows a bird perched on a tree branch. The bird has a blue head, back, and wings, with a white underbelly and a white patch on its neck. It is holding a bundle of what appears to be nesting material or an insect in its beak. The background is blurred, featuring green foliage and tree trunks, suggesting a natural, outdoor environment. The bird's claws are gripping the branch, and its posture indicates it might be preparing to build a nest or has just caught prey."

Remaining Issue

Output quality for FP8 text only seems still lower than BF16 version, but not garbage like the vision one

BF16

python examples/offline_inference/glm4_5v.py --num_prompts 3 --model glm45

Generated text: ' John and I am a 20 year old college student. I am currently studying to be a teacher and I am looking for a part time job to help pay for my college expenses. I have experience with children of all ages and I am very responsible. I am available to work any time and I am willing to travel. I am looking forward to hearing from you. Thank you. John. 610-639-8350.'
Generated text: " the head of state and head of government of the United States. The president leads the executive branch of the federal government and is the commander-in-chief of the United States Armed Forces.\nThe president is indirectly elected to a four-year term by an Electoral College (or by Congress if no candidate receives a majority of the vote in the Electoral College). Since 1937, with some exceptions, presidents have been inaugurated on January 20. On this date, following the oath of office, the president delivers the inaugural address, which sets the tone for the president's term. Since the 1950s, the president's annual address to Congress on the state of the union has often been referred to as the State of the Union address. Although the president may fulfill this requirement in any way he or she sees fit, since 1965, all presidents have delivered the speech in person before a joint session of Congress.\nSince the office was established in 1789, 44 men have served as president. The first president, George Washington, who won a unanimous vote in the Electoral College, set many precedents that would be followed in future administrations. Among these are limiting the president to two terms in office and the existence of a cabinet to advise the president on matters of policy"
Generated text: ' Paris. It is one of the most beautiful cities in the world. It is known as the City of Light. The population of Paris is about 10 million people. Paris is situated on the river Seine. It is the centre of the national economy and culture and is very beautiful. There are a lot of places of interest in Paris. Among them are: the Notre Dame cathedral, the Louvre with the famous painting the Mona Lisa by Leonardo da Vinci, the Alexandre III bridge, the Champs Elysees, the Arch of Triumph, the Eiffel Tower and others. Paris is famous for its beautiful trees and flowers too. Paris is an international centre for fashion, music, theatre, and the visual arts. There are many big stores, hotels and restaurants in Paris. Paris is a very old city. It is more than 20 centuries old. Paris has greatly contributed to education of mankind. It is still one of the world’s main centres for education and intellectual life. There are a lot of museums, art galleries, theatres, educational and scientific centres in Paris. The most well-known educational centre in Paris is the Sorbonne. Many great men of the world studied at the Sorbonne. Paris is called the City of Light not only because'

FP8

CUDA_VISIBLE_DEVICES=4,5,6,7 python examples/offline_inference/glm4_5v.py --num_prompts 3 --model glm45-fp8

Generated text: ' David and I am a 20 year old college student. I am currently studying to be a teacher. I am a very outgoing person and I am always up for a good time. I am looking for someone who is also outgoing and is not afraid to have a good time. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous.'
Generated text: ' the head of state and head of government of the United States. The president is also the commander-in-chief of the United States armed forces. The president is a symbol of the United States.\nThe president is elected to a four-year term by the people. Since 1951, no person can be elected president more than twice. The president must be at least 35 years old and a natural-born citizen of the United States. The president must have lived in the United States for at least 14 years.\nThe president is assisted by the vice president. The vice president is also elected to a four-year term. The vice president is the president of the Senate. The vice president can vote in the Senate if there is a tie.\nThe president has many powers. The president can make treaties with other countries. The president can appoint judges to the Supreme Court. The president can veto laws passed by Congress. The president can pardon people who have been convicted of crimes.\nThe president is the most powerful person in the United States. The president is the leader of the country and the head of the government.\nThe president of the United States is the head of state and head of government of the United States. The president is also the commander-in-chief of the United States armed forces. The'
Generated text: ' Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

github-actions · 2025-08-15T01:17:33Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request fixes a numerical issue with the GLM-4.5V-FP8 model by correcting the prefix for the QKV projection layer during quantization. The change is conditional on the presence of a quantization configuration, which correctly handles the pre-quantized FP8 model. However, this approach is too broad and could unintentionally affect on-the-fly quantization for other models. I've provided a suggestion to make the condition more specific to FP8 models to prevent potential regressions.

facebook-github-bot · 2025-08-15T03:16:50Z

@vladmihailescu has imported this pull request. If you are a Meta employee, you can view this in D80315216.

facebook-github-bot · 2025-08-15T03:17:19Z

@vladmihailescu has imported this pull request. If you are a Meta employee, you can view this in D80315216.

Beckham007 · 2025-08-15T15:37:59Z

any update?

Isotr0py · 2025-08-18T03:22:24Z

examples/offline_inference/glm4_5v.py

Oh, I just noticed we don't have GLM-4.5V example in examples/offline_inference/vision_language.py and examples/offline_inference/vision_language_multi_image.py. Let's consolidate this example into those two.

Done, updated examples/offline_inference/vision_language.py and examples/offline_inference/vision_language_multi_image.py to include GLM-4.5V / GLM-4.5V-FP8 and removed the additional script

houseroad · 2025-08-18T06:59:31Z

vllm/model_executor/models/glm4_1v.py

can we also check quant_config's quantize_method to see it's fp8 quantization. It could be other type of quantization.

I think the issue we are fixing is the mismatch between quant_config["ignore"] layer names and runtime layer names, and it is not tied to a specific type of quantization. (e.g. here in https://huggingface.co/zai-org/GLM-4.5V-FP8/blob/main/config.json, quant_config["quant_method"] == "compressed-tensors").

If Z.ai were to publish a new checkpoint with different quant_method, I would assume the quant_config["ignore"] might still be the same as it is right now and we would still need this fix? cc @zRzRzRzRzRzRzR

houseroad

Thanks for the fix, please address the comments.

Signed-off-by: qizixi <qizixi@meta.com>

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Duncan Moss <djm.moss@gmail.com>

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

mergify bot added the documentation Improvements or additions to documentation label Aug 15, 2025

gemini-code-assist bot reviewed Aug 15, 2025

View reviewed changes

This was referenced Aug 15, 2025

GLM-4.5V-FP8 vLLM部署生成乱码 zai-org/GLM-V#139

Closed

[Bug]: GLM-4.5V-FP8 output quality issue #22929

Open

zixi-qi force-pushed the fix-glm45-fp8-numerics branch from df736bd to bbfc684 Compare August 15, 2025 18:34

zixi-qi marked this pull request as ready for review August 15, 2025 18:34

zixi-qi force-pushed the fix-glm45-fp8-numerics branch from bbfc684 to 7f49211 Compare August 15, 2025 18:42

zRzRzRzRzRzRzR mentioned this pull request Aug 16, 2025

VLLM 4.5v FP8调用时传入图片会乱码 zai-org/GLM-4.5#48

Closed

2 tasks

youkaichao requested a review from houseroad August 16, 2025 15:36

Isotr0py approved these changes Aug 18, 2025

View reviewed changes

zRzRzRzRzRzRzR mentioned this pull request Aug 18, 2025

在使用VLLM调用本地的GLM4.5VFP8时报错无法识别框架glm4V-moe,并且transformers和vllm库都是最新的了 zai-org/GLM-V#142

Closed

2 tasks

houseroad reviewed Aug 18, 2025

View reviewed changes

houseroad approved these changes Aug 18, 2025

View reviewed changes

zixi-qi added 3 commits August 18, 2025 09:35

Fix GLM-4.5V-FP8 numerical issue

0b76adb

Signed-off-by: qizixi <qizixi@meta.com>

fix lint and add comment

dfb0d4a

Signed-off-by: qizixi <qizixi@meta.com>

update glm4.5 scripts

fdc3038

Signed-off-by: qizixi <qizixi@meta.com>

zixi-qi force-pushed the fix-glm45-fp8-numerics branch from 7f49211 to fdc3038 Compare August 18, 2025 20:30

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 19, 2025

Merge branch 'main' into fix-glm45-fp8-numerics

da5be1d

DarkLight1337 enabled auto-merge (squash) August 19, 2025 06:13

DarkLight1337 merged commit 4efd43e into vllm-project:main Aug 19, 2025
42 checks passed

princepride pushed a commit to princepride/vllm that referenced this pull request Aug 20, 2025

Fix GLM-4.5V-FP8 numerical issue (vllm-project#22949)

4489858

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

zRzRzRzRzRzRzR mentioned this pull request Aug 20, 2025

GLM4.5V-FP8在读取图片后输出全是感叹号 zai-org/GLM-V#153

Closed

divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025

Fix GLM-4.5V-FP8 numerical issue (vllm-project#22949)

34fbb33

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

cyang49 pushed a commit to cyang49/vllm that referenced this pull request Aug 20, 2025

Fix GLM-4.5V-FP8 numerical issue (vllm-project#22949)

2906dc0

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025

Fix GLM-4.5V-FP8 numerical issue (vllm-project#22949)

4858a70

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Duncan Moss <djm.moss@gmail.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

Fix GLM-4.5V-FP8 numerical issue (vllm-project#22949)

1d4397f

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

Fix GLM-4.5V-FP8 numerical issue (vllm-project#22949)

690fd97

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

Fix GLM-4.5V-FP8 numerical issue (vllm-project#22949)

bcf7c2a

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

Fix GLM-4.5V-FP8 numerical issue (vllm-project#22949)

5fa65b2

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025

Fix GLM-4.5V-FP8 numerical issue (vllm-project#22949)

9413ec6

Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix GLM-4.5V-FP8 numerical issue #22949

Fix GLM-4.5V-FP8 numerical issue #22949

Uh oh!

zixi-qi commented Aug 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

facebook-github-bot commented Aug 15, 2025

Uh oh!

facebook-github-bot commented Aug 15, 2025

Uh oh!

Beckham007 commented Aug 15, 2025

Uh oh!

Isotr0py Aug 18, 2025

Uh oh!

zixi-qi Aug 18, 2025

Uh oh!

houseroad Aug 18, 2025

Uh oh!

zixi-qi Aug 18, 2025

Uh oh!

houseroad left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Fix GLM-4.5V-FP8 numerical issue #22949

Fix GLM-4.5V-FP8 numerical issue #22949

Uh oh!

Conversation

zixi-qi commented Aug 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Remaining Issue

Uh oh!

github-actions bot commented Aug 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

facebook-github-bot commented Aug 15, 2025

Uh oh!

facebook-github-bot commented Aug 15, 2025

Uh oh!

Beckham007 commented Aug 15, 2025

Uh oh!

Isotr0py Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

zixi-qi Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

zixi-qi Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zixi-qi commented Aug 15, 2025 •

edited by github-actions bot

Loading