[AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault #23692

rasmith · 2025-08-26T22:49:31Z

The tensor loaded into bn is multiplied by stride_k_cache_bs in the _fwd_kernel in prefix_prefill.py and produces an integer overflow resulting in negative offsets which result in a GPU segfault. Changing stride_k_cache_bs to be tl.int64 in the function signature did not work. Casting the bn tensor to tl.int64 fixes the problem. I added some additional casts into _fwd_kernel_flash_attn_v2 and _fwd_kernel_alibi as well.

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

…libi Signed-off-by: Randall Smith <Randall.Smith@amd.com>

gemini-code-assist

Code Review

This pull request correctly addresses a critical integer overflow in the _fwd_kernel for AMD GPUs by casting the bn tensor to tl.int64, which prevents a potential GPU segfault. However, as noted in the pull request description, similar vulnerabilities exist in _fwd_kernel_flash_attn_v2 and _fwd_kernel_alibi. The fixes for these functions are currently missing from the patch. It is crucial to include these changes to ensure the bug is fully resolved across all relevant kernels.

vllm/attention/ops/prefix_prefill.py

* 'main' of https://github.com/845473182/vllm: (457 commits) [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132) [Misc] Add check for dual_chunk_attention (vllm-project#24070) [Doc]: fix typos in Python comments (vllm-project#24115) [Doc]: fix typos in Python comments (vllm-project#24093) [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660) fix some typos (vllm-project#24071) [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656) Upgrade xgrammar to 0.1.23 (vllm-project#22988) Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073) [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081) [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121) [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119) [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692) [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936) [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370) Fix weights loading for Apertus (vllm-project#24100) [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110) [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902) Run ruff format on a few files. (vllm-project#24075) [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945) ...

… segfault (vllm-project#23692) Signed-off-by: Randall Smith <Randall.Smith@amd.com>

rasmith added 2 commits August 26, 2025 22:41

Cast bn to int64 to avoid integer overflow

ce7f3d7

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

add additional casts into _fwd_kernel_flash_attn_v2 and _fwd_kernel_a…

229c21e

…libi Signed-off-by: Randall Smith <Randall.Smith@amd.com>

mergify bot added the rocm Related to AMD ROCm label Aug 26, 2025

gemini-code-assist bot reviewed Aug 26, 2025

View reviewed changes

vllm/attention/ops/prefix_prefill.py Show resolved Hide resolved

Merge branch 'main' into ransmith_fix_prefix_prefill_segv

ca7d184

jatseng-ai approved these changes Aug 27, 2025

View reviewed changes

SageMoore approved these changes Aug 28, 2025

View reviewed changes

gshtras approved these changes Aug 28, 2025

View reviewed changes

gshtras added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 2, 2025

gshtras enabled auto-merge (squash) September 2, 2025 16:51

rasmith added 2 commits September 2, 2025 12:46

Merge branch 'main' into ransmith_fix_prefix_prefill_segv

0ca54d1

Merge branch 'main' into ransmith_fix_prefix_prefill_segv

969d50c

gshtras merged commit 457e471 into vllm-project:main Sep 2, 2025
39 checks passed

AlpinDale mentioned this pull request Sep 3, 2025

fix: cast offsets tensor bn to tl.int64 to avoid GPU segfault aphrodite-engine/aphrodite-engine#1491

Merged

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU…

fb7dcea

… segfault (vllm-project#23692) Signed-off-by: Randall Smith <Randall.Smith@amd.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU…

55be149

… segfault (vllm-project#23692) Signed-off-by: Randall Smith <Randall.Smith@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault #23692

[AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault #23692

Uh oh!

rasmith commented Aug 26, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault #23692

[AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault #23692

Uh oh!

Conversation

rasmith commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rasmith commented Aug 26, 2025 •

edited

Loading