Skip to content

Conversation

@zixi-qi
Copy link
Collaborator

@zixi-qi zixi-qi commented Aug 15, 2025

Purpose

Fix numerical regression when running GLM-4.5V-FP8 with vLLM. Issue is that for vision attention qkv projection, the model config file want to turn off quantization by adding "visual.blocks.1.attn.qkv_proj" to https://huggingface.co/zai-org/GLM-4.5V-FP8/blob/main/config.json . However at runtime, the prefix is actually called "visual.blocks.1.attn.qkv" instead.

This PR does a short term fix to change the prefix so that quantization can be turned off for those layers

Test Plan

CUDA_VISIBLE_DEVICES=4,5,6,7 python examples/offline_inference/glm4_5v.py --num_prompts 3 --model glm45-fp8 --mm

Test Result

  • Before the fix
Adding requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:05<00:00,  1.80s/it]
Processed prompts: 100%|█████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.60it/s, est. speed input: 9574.60 toks/s, output: 10.14 toks/s]
Generated text: ', '
Generated text: ', , the., the '
Generated text: ',  connected by, '
  • After the fix
Adding requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.62s/it]
Processed prompts: 100%|█████████████████████████████████████████████████████████████████| 3/3 [00:14<00:00,  4.83s/it, est. speed input: 1239.39 toks/s, output: 50.00 toks/s]
Generated text: "<think>The user is asking to describe the content of the image. First, I need to look at the image provided. The image shows a duck swimming in water. Let me observe the details: the duck has a green head, a yellow beak, and its body is a mix of brown, white, and black. The water is blue with ripples, and there's a reflection of the duck in the water. I should describe the duck's appearance, its position in the water, and the surrounding environment. Make sure to mention the colors and any notable features like the reflection. Also, check if there are any other elements, but it seems the main focus is the duck. So the content is a duck swimming in water with its reflection visible.</think>\nThe image shows a duck swimming in a body of water. The duck has a green iridescent head, a bright yellow beak, and a body with brown, white, and black feathers. The water is a deep blue with gentle ripples, and the duck's reflection is visible on the surface. The overall scene depicts a calm aquatic environment with the duck as the central subject.\nThe answer is <|begin_of_box|>A duck swimming in water with its reflection visible.<|end_of_box|>"
Generated text: "<think>The user is asking to describe the content of the image. First, I need to look at the image carefully. The image shows a lion in a grassy field. The lion has a prominent mane, which is a mix of brown and darker colors. The background is a field of tall, dry grass with some green patches, indicating a savanna or grassland environment. The lion is sitting or standing in the grass, looking directly at the camera with a calm expression. I should describe the lion's features, the environment, and any other notable details. Make sure to mention the mane, the grass, and the overall setting. Also, check if there are any other elements, but it seems the main focus is the lion in the grassland.</think>\nThe image depicts a majestic lion with a thick, dark brown and black mane, sitting in a field of tall, dry grass. The lion is facing the camera, with a calm and alert expression. The background consists of a vast expanse of golden-brown grass, with some green patches, suggesting a savanna or grassland habitat. The lighting appears natural, likely from sunlight, highlighting the lion's features and the texture of the grass."
Generated text: "<think>The user is asking to describe the content of the image. First, I need to look at the image carefully. The image shows a bird perched on a tree branch. The bird has a blue head and back, with a white underpart. It's holding something in its beak, which looks like a nest material or maybe an insect. The background is blurred, showing green foliage and tree trunks, indicating a natural outdoor setting. I should describe the bird's appearance, its position, and the surroundings. Make sure to mention the colors, the object in its beak, and the environment.</think>\nThe image shows a bird perched on a tree branch. The bird has a blue head, back, and wings, with a white underbelly and a white patch on its neck. It is holding a bundle of what appears to be nesting material or an insect in its beak. The background is blurred, featuring green foliage and tree trunks, suggesting a natural, outdoor environment. The bird's claws are gripping the branch, and its posture indicates it might be preparing to build a nest or has just caught prey."

Remaining Issue

Output quality for FP8 text only seems still lower than BF16 version, but not garbage like the vision one

  • BF16
python examples/offline_inference/glm4_5v.py --num_prompts 3 --model glm45

Generated text: ' John and I am a 20 year old college student. I am currently studying to be a teacher and I am looking for a part time job to help pay for my college expenses. I have experience with children of all ages and I am very responsible. I am available to work any time and I am willing to travel. I am looking forward to hearing from you. Thank you. John. 610-639-8350.'
Generated text: " the head of state and head of government of the United States. The president leads the executive branch of the federal government and is the commander-in-chief of the United States Armed Forces.\nThe president is indirectly elected to a four-year term by an Electoral College (or by Congress if no candidate receives a majority of the vote in the Electoral College). Since 1937, with some exceptions, presidents have been inaugurated on January 20. On this date, following the oath of office, the president delivers the inaugural address, which sets the tone for the president's term. Since the 1950s, the president's annual address to Congress on the state of the union has often been referred to as the State of the Union address. Although the president may fulfill this requirement in any way he or she sees fit, since 1965, all presidents have delivered the speech in person before a joint session of Congress.\nSince the office was established in 1789, 44 men have served as president. The first president, George Washington, who won a unanimous vote in the Electoral College, set many precedents that would be followed in future administrations. Among these are limiting the president to two terms in office and the existence of a cabinet to advise the president on matters of policy"
Generated text: ' Paris. It is one of the most beautiful cities in the world. It is known as the City of Light. The population of Paris is about 10 million people. Paris is situated on the river Seine. It is the centre of the national economy and culture and is very beautiful. There are a lot of places of interest in Paris. Among them are: the Notre Dame cathedral, the Louvre with the famous painting the Mona Lisa by Leonardo da Vinci, the Alexandre III bridge, the Champs Elysees, the Arch of Triumph, the Eiffel Tower and others. Paris is famous for its beautiful trees and flowers too. Paris is an international centre for fashion, music, theatre, and the visual arts. There are many big stores, hotels and restaurants in Paris. Paris is a very old city. It is more than 20 centuries old. Paris has greatly contributed to education of mankind. It is still one of the world’s main centres for education and intellectual life. There are a lot of museums, art galleries, theatres, educational and scientific centres in Paris. The most well-known educational centre in Paris is the Sorbonne. Many great men of the world studied at the Sorbonne. Paris is called the City of Light not only because'

  • FP8
CUDA_VISIBLE_DEVICES=4,5,6,7 python examples/offline_inference/glm4_5v.py --num_prompts 3 --model glm45-fp8

Generated text: ' David and I am a 20 year old college student. I am currently studying to be a teacher. I am a very outgoing person and I am always up for a good time. I am looking for someone who is also outgoing and is not afraid to have a good time. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous. I am looking for someone who is not afraid to be themselves and is not afraid to be spontaneous.'
Generated text: ' the head of state and head of government of the United States. The president is also the commander-in-chief of the United States armed forces. The president is a symbol of the United States.\nThe president is elected to a four-year term by the people. Since 1951, no person can be elected president more than twice. The president must be at least 35 years old and a natural-born citizen of the United States. The president must have lived in the United States for at least 14 years.\nThe president is assisted by the vice president. The vice president is also elected to a four-year term. The vice president is the president of the Senate. The vice president can vote in the Senate if there is a tie.\nThe president has many powers. The president can make treaties with other countries. The president can appoint judges to the Supreme Court. The president can veto laws passed by Congress. The president can pardon people who have been convicted of crimes.\nThe president is the most powerful person in the United States. The president is the leader of the country and the head of the government.\nThe president of the United States is the head of state and head of government of the United States. The president is also the commander-in-chief of the United States armed forces. The'
Generated text: ' Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the documentation Improvements or additions to documentation label Aug 15, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a numerical issue with the GLM-4.5V-FP8 model by correcting the prefix for the QKV projection layer during quantization. The change is conditional on the presence of a quantization configuration, which correctly handles the pre-quantized FP8 model. However, this approach is too broad and could unintentionally affect on-the-fly quantization for other models. I've provided a suggestion to make the condition more specific to FP8 models to prevent potential regressions.

@facebook-github-bot
Copy link

@vladmihailescu has imported this pull request. If you are a Meta employee, you can view this in D80315216.

@facebook-github-bot
Copy link

@vladmihailescu has imported this pull request. If you are a Meta employee, you can view this in D80315216.

@Beckham007
Copy link

any update?

@zixi-qi zixi-qi force-pushed the fix-glm45-fp8-numerics branch from df736bd to bbfc684 Compare August 15, 2025 18:34
@zixi-qi zixi-qi marked this pull request as ready for review August 15, 2025 18:34
@zixi-qi zixi-qi force-pushed the fix-glm45-fp8-numerics branch from bbfc684 to 7f49211 Compare August 15, 2025 18:42
@youkaichao youkaichao requested a review from houseroad August 16, 2025 15:36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just noticed we don't have GLM-4.5V example in examples/offline_inference/vision_language.py and examples/offline_inference/vision_language_multi_image.py. Let's consolidate this example into those two.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, updated examples/offline_inference/vision_language.py and examples/offline_inference/vision_language_multi_image.py to include GLM-4.5V / GLM-4.5V-FP8 and removed the additional script

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also check quant_config's quantize_method to see it's fp8 quantization. It could be other type of quantization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue we are fixing is the mismatch between quant_config["ignore"] layer names and runtime layer names, and it is not tied to a specific type of quantization. (e.g. here in https://huggingface.co/zai-org/GLM-4.5V-FP8/blob/main/config.json, quant_config["quant_method"] == "compressed-tensors").

If Z.ai were to publish a new checkpoint with different quant_method, I would assume the quant_config["ignore"] might still be the same as it is right now and we would still need this fix? cc @zRzRzRzRzRzRzR

Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, please address the comments.

Signed-off-by: qizixi <qizixi@meta.com>
Signed-off-by: qizixi <qizixi@meta.com>
Signed-off-by: qizixi <qizixi@meta.com>
@zixi-qi zixi-qi force-pushed the fix-glm45-fp8-numerics branch from 7f49211 to fdc3038 Compare August 18, 2025 20:30
@houseroad houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 19, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 19, 2025 06:13
@DarkLight1337 DarkLight1337 merged commit 4efd43e into vllm-project:main Aug 19, 2025
42 checks passed
princepride pushed a commit to princepride/vllm that referenced this pull request Aug 20, 2025
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
cyang49 pushed a commit to cyang49/vllm that referenced this pull request Aug 20, 2025
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants