Add glm4.5v tp2,4 fp8 config on H100_80GB #23443

chenxi-yang · 2025-08-22T20:06:58Z

Summary: as title, generated with D80713197

Test Plan:
Run fused_moe on H100_80GB

Rollback Plan:

Reviewed By: zzh142857

Differential Revision: D80713433

facebook-github-bot · 2025-08-22T20:07:17Z

This pull request was exported from Phabricator. Differential Revision: D80713433

Summary: Pull Request resolved: vllm-project#23443 as title, generated with D80713197 Test Plan: Run fused_moe on H100_80GB Rollback Plan: Reviewed By: zzh142857 Differential Revision: D80713433

facebook-github-bot · 2025-08-22T20:14:08Z

This pull request was exported from Phabricator. Differential Revision: D80713433

gemini-code-assist

Code Review

This pull request adds new FP8 configurations for glm4.5v on H100_80GB GPUs with tensor parallelism of 2 and 4. It introduces a new environment variable VLLM_USE_FUSED_MOE_KERNEL_IN_COMPRESSED_QUANTIZATION to control the fused MoE kernel usage. The tensor parallelism logic in glm4_1v.py is refactored to align with vLLM's standard implementation. My review includes a suggestion to refactor duplicated code in compressed_tensors_moe.py to improve maintainability.

Co-authored-by: Chenxi Yang <cxyang@meta.com>

Co-authored-by: Chenxi Yang <cxyang@meta.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

Co-authored-by: Chenxi Yang <cxyang@meta.com>

Co-authored-by: Chenxi Yang <cxyang@meta.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Co-authored-by: Chenxi Yang <cxyang@meta.com>

chenxi-yang requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners August 22, 2025 20:06

chenxi-yang force-pushed the export-D80713433 branch from 54eb4a9 to 03e06d5 Compare August 22, 2025 20:10

Add glm4.5v tp2,4 fp8 config on H100_80GB (vllm-project#23443)

ab94208

Summary: Pull Request resolved: vllm-project#23443 as title, generated with D80713197 Test Plan: Run fused_moe on H100_80GB Rollback Plan: Reviewed By: zzh142857 Differential Revision: D80713433

chenxi-yang force-pushed the export-D80713433 branch from 03e06d5 to ab94208 Compare August 22, 2025 20:14

gemini-code-assist bot reviewed Aug 22, 2025

View reviewed changes

DarkLight1337 approved these changes Aug 23, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) August 23, 2025 00:57

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 23, 2025

DarkLight1337 merged commit 308fa28 into vllm-project:main Aug 23, 2025
49 checks passed

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

Add glm4.5v tp2,4 fp8 config on H100_80GB (vllm-project#23443)

d4e409b

Co-authored-by: Chenxi Yang <cxyang@meta.com>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

Add glm4.5v tp2,4 fp8 config on H100_80GB (vllm-project#23443)

968d9d2

Co-authored-by: Chenxi Yang <cxyang@meta.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

Add glm4.5v tp2,4 fp8 config on H100_80GB (vllm-project#23443)

c30b187

Co-authored-by: Chenxi Yang <cxyang@meta.com>

mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025

Add glm4.5v tp2,4 fp8 config on H100_80GB (vllm-project#23443)

60fc7a6

Co-authored-by: Chenxi Yang <cxyang@meta.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025

Add glm4.5v tp2,4 fp8 config on H100_80GB (vllm-project#23443)

7630fad

Co-authored-by: Chenxi Yang <cxyang@meta.com>

ekagra-ranjan pushed a commit to ekagra-ranjan/vllm that referenced this pull request Sep 4, 2025

Add glm4.5v tp2,4 fp8 config on H100_80GB (vllm-project#23443)

58d6fa6

Co-authored-by: Chenxi Yang <cxyang@meta.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

Add glm4.5v tp2,4 fp8 config on H100_80GB (vllm-project#23443)

fa6848a

Co-authored-by: Chenxi Yang <cxyang@meta.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add glm4.5v tp2,4 fp8 config on H100_80GB #23443

Add glm4.5v tp2,4 fp8 config on H100_80GB #23443

Uh oh!

chenxi-yang commented Aug 22, 2025

Uh oh!

facebook-github-bot commented Aug 22, 2025

Uh oh!

facebook-github-bot commented Aug 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add glm4.5v tp2,4 fp8 config on H100_80GB #23443

Add glm4.5v tp2,4 fp8 config on H100_80GB #23443

Uh oh!

Conversation

chenxi-yang commented Aug 22, 2025

Uh oh!

facebook-github-bot commented Aug 22, 2025

Uh oh!

facebook-github-bot commented Aug 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants