-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
[Qwen3-Next] MOE configs for H100 TP4 #24699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new fused MoE kernel configuration for Qwen3-Next on H100 with TP=4. However, there is a critical issue with the filename E=512,N=128,device_name=NVIDIA_H100_80GB_HBM3.json. The intermediate size N=128 seems unusually small for a large model, especially when compared to the example of Mixtral (N=3584 for TP=4) in the documentation. If this value is incorrect, these optimized settings will not be loaded, negating the benefit of this change. It is also recommended to fill out the pull request description with benchmark results that produced these configuration values to provide context for the review.
| @@ -0,0 +1,146 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intermediate size N=128 in the filename appears to be incorrect for a model like Qwen3-Next running on H100 with TP=4. For context, the README in this directory mentions that for Mixtral with TP=4, the intermediate size N is 3584. A value of 128 is unusually small for a large model on this hardware and is likely a typo. An incorrect N value in the filename will cause vllm to fail to load this optimized configuration at runtime for the intended model, leading to the use of default, sub-optimal configurations. This would negate the performance benefits of this change. The value of N in the filename should be corrected to match the model's actual sharded intermediate size.
|
Hi, qq: how i can generate this for the h100 nvl? |
|
You can follow the instruction here. |
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
Again sorry to bother, i tried with the examples i found, but its trying to connect with ray and its timeout after a while? mind to post the arguments? |
|
My command is |
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
thank you! I had firewall issues with Ray! i solve it. |
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.