-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
[BUGFIX ] fix undefined silu_and_mul_nvfp4_quant #23929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX ] fix undefined silu_and_mul_nvfp4_quant #23929
Conversation
Signed-off-by: hongchao <hongchao@msh.team>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly identifies the need to restrict the compilation and usage of silu_and_mul_nvfp4_quant to specific CUDA architectures that support NVFP4. The changes in the C++ files appear to implement this correctly using preprocessor directives. However, there is a critical issue in the corresponding Python check within vllm/compilation/fix_functionalization.py. The device capability check uses an incorrect value, which will prevent this optimized kernel from being used on the intended hardware (e.g., Blackwell GPUs).
| elif current_platform.has_device_capability( | ||
| 100 | ||
| ) and at_target == torch.ops._C.silu_and_mul_nvfp4_quant.default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check current_platform.has_device_capability(100) is incorrect. The has_device_capability method compares its integer argument with the major version of the CUDA compute capability. For Blackwell GPUs (SM 10.0), the major version is 10. Therefore, the check 10 >= 100 will always evaluate to false, preventing this code path from ever being taken on the intended hardware. This should be has_device_capability(10) to correctly target devices with compute capability 10.x and above.
| elif current_platform.has_device_capability( | |
| 100 | |
| ) and at_target == torch.ops._C.silu_and_mul_nvfp4_quant.default: | |
| elif current_platform.has_device_capability( | |
| 10 | |
| ) and at_target == torch.ops._C.silu_and_mul_nvfp4_quant.default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easier way to check here is "if hasattr(torch.ops._C, silu_and_mul_nvfp4_quant)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel if hasattr(torch.ops._C, silu_and_mul_nvfp4_quant) makes more sense.
|
relative issue: #23916 |
|
@mgoin can you help take a look? |
|
I confirmed that this PR fixes the issue for me |
youkaichao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the fix!
|
I already have a fix in #23727. Like other nvfp4 kernels, I added the function definition in csrc/quantization/fp4/nvfp4_quant_entry.cu so the function will alway be defined in general code path. This should fix this issue in a cleaner way. |
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: hongchao <hongchao@msh.team> Signed-off-by: Richard Zou <zou3519@gmail.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: Richard Zou <zou3519@gmail.com> Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
Signed-off-by: hongchao <hongchao@msh.team> Signed-off-by: Richard Zou <zou3519@gmail.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: Richard Zou <zou3519@gmail.com> Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Fix undefined silu_and_mul_nvfp4_quant on H or A cuda devices.
