Skip to content

Conversation

@youzhedian
Copy link
Contributor

@youzhedian youzhedian commented Aug 29, 2025

Fix undefined silu_and_mul_nvfp4_quant on H or A cuda devices.
image

Signed-off-by: hongchao <hongchao@msh.team>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly identifies the need to restrict the compilation and usage of silu_and_mul_nvfp4_quant to specific CUDA architectures that support NVFP4. The changes in the C++ files appear to implement this correctly using preprocessor directives. However, there is a critical issue in the corresponding Python check within vllm/compilation/fix_functionalization.py. The device capability check uses an incorrect value, which will prevent this optimized kernel from being used on the intended hardware (e.g., Blackwell GPUs).

Comment on lines 100 to 102
elif current_platform.has_device_capability(
100
) and at_target == torch.ops._C.silu_and_mul_nvfp4_quant.default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The check current_platform.has_device_capability(100) is incorrect. The has_device_capability method compares its integer argument with the major version of the CUDA compute capability. For Blackwell GPUs (SM 10.0), the major version is 10. Therefore, the check 10 >= 100 will always evaluate to false, preventing this code path from ever being taken on the intended hardware. This should be has_device_capability(10) to correctly target devices with compute capability 10.x and above.

Suggested change
elif current_platform.has_device_capability(
100
) and at_target == torch.ops._C.silu_and_mul_nvfp4_quant.default:
elif current_platform.has_device_capability(
10
) and at_target == torch.ops._C.silu_and_mul_nvfp4_quant.default:

Copy link
Collaborator

@zou3519 zou3519 Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier way to check here is "if hasattr(torch.ops._C, silu_and_mul_nvfp4_quant)"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel if hasattr(torch.ops._C, silu_and_mul_nvfp4_quant) makes more sense.

@youzhedian
Copy link
Contributor Author

relative issue: #23916

@youzhedian youzhedian changed the title fix undefined silu_and_mul_nvfp4_quant [BUGFIX ] fix undefined silu_and_mul_nvfp4_quant Aug 29, 2025
@youkaichao
Copy link
Member

@mgoin can you help take a look?

@zou3519
Copy link
Collaborator

zou3519 commented Aug 29, 2025

I got this as well, I bisected to #23671 as the cause. I have a revert open at #23947 but let me look at this PR

@zou3519
Copy link
Collaborator

zou3519 commented Aug 29, 2025

I confirmed that this PR fixes the issue for me

Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix!

zou3519 and others added 2 commits August 29, 2025 09:13
Signed-off-by: Richard Zou <zou3519@gmail.com>
@zou3519 zou3519 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 29, 2025
@elvischenv
Copy link
Contributor

elvischenv commented Aug 29, 2025

I already have a fix in #23727.

Like other nvfp4 kernels, I added the function definition in csrc/quantization/fp4/nvfp4_quant_entry.cu so the function will alway be defined in general code path. This should fix this issue in a cleaner way.

@simon-mo simon-mo merged commit 0dc9532 into vllm-project:main Aug 29, 2025
10 of 18 checks passed
elvischenv added a commit to elvischenv/vllm that referenced this pull request Sep 3, 2025
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
dcmaddix pushed a commit to dcmaddix/vllm that referenced this pull request Sep 19, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
dcmaddix pushed a commit to dcmaddix/vllm that referenced this pull request Oct 1, 2025
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants