Skip to content

Conversation

@heheda12345
Copy link
Collaborator

@heheda12345 heheda12345 commented Sep 11, 2025

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Chen Zhang <zhangch99@outlook.com>
@mergify mergify bot added the qwen Related to Qwen models label Sep 11, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new fused MoE kernel configuration for Qwen3-Next on H100 with TP=4. However, there is a critical issue with the filename E=512,N=128,device_name=NVIDIA_H100_80GB_HBM3.json. The intermediate size N=128 seems unusually small for a large model, especially when compared to the example of Mixtral (N=3584 for TP=4) in the documentation. If this value is incorrect, these optimized settings will not be loaded, negating the benefit of this change. It is also recommended to fill out the pull request description with benchmark results that produced these configuration values to provide context for the review.

@@ -0,0 +1,146 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The intermediate size N=128 in the filename appears to be incorrect for a model like Qwen3-Next running on H100 with TP=4. For context, the README in this directory mentions that for Mixtral with TP=4, the intermediate size N is 3584. A value of 128 is unusually small for a large model on this hardware and is likely a typo. An incorrect N value in the filename will cause vllm to fail to load this optimized configuration at runtime for the intended model, leading to the use of default, sub-optimal configurations. This would negate the performance benefits of this change. The value of N in the filename should be corrected to match the model's actual sharded intermediate size.

@simon-mo simon-mo merged commit f82f7a8 into vllm-project:main Sep 11, 2025
7 of 10 checks passed
@nyo16
Copy link

nyo16 commented Sep 12, 2025

Hi, qq: how i can generate this for the h100 nvl?

@heheda12345
Copy link
Collaborator Author

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
@nyo16
Copy link

nyo16 commented Sep 14, 2025

Again sorry to bother, i tried with the examples i found, but its trying to connect with ray and its timeout after a while? mind to post the arguments?

@heheda12345
Copy link
Collaborator Author

My command is python3 benchmarks/kernels/benchmark_moe.py --model MODEL_PATH --tp-size 4 --tune

dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
@nyo16
Copy link

nyo16 commented Sep 17, 2025

My command is python3 benchmarks/kernels/benchmark_moe.py --model MODEL_PATH --tp-size 4 --tune

thank you! I had firewall issues with Ray! i solve it.

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qwen Related to Qwen models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants