Skip to content

Conversation

@chenxi-yang
Copy link
Contributor

Summary: as title, generated with D80713197

Test Plan:
Run fused_moe on H100_80GB

Rollback Plan:

Reviewed By: zzh142857

Differential Revision: D80713433

@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D80713433

Summary:
Pull Request resolved: vllm-project#23443

as title, generated with D80713197

Test Plan:
Run fused_moe on H100_80GB

Rollback Plan:

Reviewed By: zzh142857

Differential Revision: D80713433
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D80713433

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds new FP8 configurations for glm4.5v on H100_80GB GPUs with tensor parallelism of 2 and 4. It introduces a new environment variable VLLM_USE_FUSED_MOE_KERNEL_IN_COMPRESSED_QUANTIZATION to control the fused MoE kernel usage. The tensor parallelism logic in glm4_1v.py is refactored to align with vLLM's standard implementation. My review includes a suggestion to refactor duplicated code in compressed_tensors_moe.py to improve maintainability.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 23, 2025 00:57
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 23, 2025
@DarkLight1337 DarkLight1337 merged commit 308fa28 into vllm-project:main Aug 23, 2025
49 checks passed
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
Co-authored-by: Chenxi Yang <cxyang@meta.com>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
Co-authored-by: Chenxi Yang <cxyang@meta.com>
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Co-authored-by: Chenxi Yang <cxyang@meta.com>
mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025
Co-authored-by: Chenxi Yang <cxyang@meta.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
Co-authored-by: Chenxi Yang <cxyang@meta.com>
ekagra-ranjan pushed a commit to ekagra-ranjan/vllm that referenced this pull request Sep 4, 2025
Co-authored-by: Chenxi Yang <cxyang@meta.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Co-authored-by: Chenxi Yang <cxyang@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants