[Qwen3-Next] MOE configs for H100 TP4 #24699

heheda12345 · 2025-09-11T22:20:17Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

gemini-code-assist

Code Review

This pull request adds a new fused MoE kernel configuration for Qwen3-Next on H100 with TP=4. However, there is a critical issue with the filename E=512,N=128,device_name=NVIDIA_H100_80GB_HBM3.json. The intermediate size N=128 seems unusually small for a large model, especially when compared to the example of Mixtral (N=3584 for TP=4) in the documentation. If this value is incorrect, these optimized settings will not be loaded, negating the benefit of this change. It is also recommended to fill out the pull request description with benchmark results that produced these configuration values to provide context for the review.

gemini-code-assist · 2025-09-11T22:21:33Z

vllm/model_executor/layers/fused_moe/configs/E=512,N=128,device_name=NVIDIA_H100_80GB_HBM3.json

@@ -0,0 +1,146 @@
+{


The intermediate size N=128 in the filename appears to be incorrect for a model like Qwen3-Next running on H100 with TP=4. For context, the README in this directory mentions that for Mixtral with TP=4, the intermediate size N is 3584. A value of 128 is unusually small for a large model on this hardware and is likely a typo. An incorrect N value in the filename will cause vllm to fail to load this optimized configuration at runtime for the intended model, leading to the use of default, sub-optimal configurations. This would negate the performance benefits of this change. The value of N in the filename should be corrected to match the model's actual sharded intermediate size.

nyo16 · 2025-09-12T19:10:48Z

Hi, qq: how i can generate this for the h100 nvl?

heheda12345 · 2025-09-12T19:47:14Z

You can follow the instruction here.
https://github.com/vllm-project/recipes/blame/ad432ff74eda24afca9601ba39543ef40e64eaec/Qwen/Qwen3-Next.md#L73

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

nyo16 · 2025-09-14T11:23:57Z

Again sorry to bother, i tried with the examples i found, but its trying to connect with ray and its timeout after a while? mind to post the arguments?

heheda12345 · 2025-09-14T21:33:54Z

My command is python3 benchmarks/kernels/benchmark_moe.py --model MODEL_PATH --tp-size 4 --tune

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

nyo16 · 2025-09-17T17:18:16Z

My command is python3 benchmarks/kernels/benchmark_moe.py --model MODEL_PATH --tp-size 4 --tune

thank you! I had firewall issues with Ray! i solve it.

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

h100 tp4

a0e8a4e

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

mergify bot added the qwen Related to Qwen models label Sep 11, 2025

gemini-code-assist bot reviewed Sep 11, 2025

View reviewed changes

simon-mo approved these changes Sep 11, 2025

View reviewed changes

simon-mo merged commit f82f7a8 into vllm-project:main Sep 11, 2025
7 of 10 checks passed

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[Qwen3-Next] MOE configs for H100 TP4 (vllm-project#24699)

7b8868a

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025

[Qwen3-Next] MOE configs for H100 TP4 (vllm-project#24699)

2fc8c88

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Qwen3-Next] MOE configs for H100 TP4 (vllm-project#24699)

fe3e029

Signed-off-by: Chen Zhang <zhangch99@outlook.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Qwen3-Next] MOE configs for H100 TP4 (vllm-project#24699)

e0418b3

Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Qwen3-Next] MOE configs for H100 TP4 (vllm-project#24699)

3875174

Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Qwen3-Next] MOE configs for H100 TP4 #24699

[Qwen3-Next] MOE configs for H100 TP4 #24699

Uh oh!

heheda12345 commented Sep 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 11, 2025

Uh oh!

Uh oh!

nyo16 commented Sep 12, 2025

Uh oh!

heheda12345 commented Sep 12, 2025

Uh oh!

nyo16 commented Sep 14, 2025

Uh oh!

heheda12345 commented Sep 14, 2025

Uh oh!

nyo16 commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Qwen3-Next] MOE configs for H100 TP4 #24699

[Qwen3-Next] MOE configs for H100 TP4 #24699

Uh oh!

Conversation

heheda12345 commented Sep 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nyo16 commented Sep 12, 2025

Uh oh!

heheda12345 commented Sep 12, 2025

Uh oh!

nyo16 commented Sep 14, 2025

Uh oh!

heheda12345 commented Sep 14, 2025

Uh oh!

nyo16 commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

heheda12345 commented Sep 11, 2025 •

edited by github-actions bot

Loading