Skip to content

Conversation

@rasmith
Copy link
Contributor

@rasmith rasmith commented Aug 26, 2025

The tensor loaded into bn is multiplied by stride_k_cache_bs in the _fwd_kernel in prefix_prefill.py and produces an integer overflow resulting in negative offsets which result in a GPU segfault. Changing stride_k_cache_bs to be tl.int64 in the function signature did not work. Casting the bn tensor to tl.int64 fixes the problem. I added some additional casts into _fwd_kernel_flash_attn_v2 and _fwd_kernel_alibi as well.

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
…libi

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
@mergify mergify bot added the rocm Related to AMD ROCm label Aug 26, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a critical integer overflow in the _fwd_kernel for AMD GPUs by casting the bn tensor to tl.int64, which prevents a potential GPU segfault. However, as noted in the pull request description, similar vulnerabilities exist in _fwd_kernel_flash_attn_v2 and _fwd_kernel_alibi. The fixes for these functions are currently missing from the patch. It is crucial to include these changes to ensure the bug is fully resolved across all relevant kernels.

@gshtras gshtras added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 2, 2025
@gshtras gshtras enabled auto-merge (squash) September 2, 2025 16:51
@gshtras gshtras merged commit 457e471 into vllm-project:main Sep 2, 2025
39 checks passed
845473182 pushed a commit to 845473182/vllm that referenced this pull request Sep 3, 2025
* 'main' of https://github.com/845473182/vllm: (457 commits)
  [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132)
  [Misc] Add check for dual_chunk_attention (vllm-project#24070)
  [Doc]: fix typos in Python comments (vllm-project#24115)
  [Doc]: fix typos in Python comments (vllm-project#24093)
  [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660)
  fix some typos (vllm-project#24071)
  [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656)
  Upgrade xgrammar to 0.1.23 (vllm-project#22988)
  Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073)
  [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081)
  [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121)
  [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119)
  [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692)
  [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936)
  [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370)
  Fix weights loading for Apertus (vllm-project#24100)
  [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110)
  [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902)
  Run ruff format on a few files. (vllm-project#24075)
  [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945)
  ...
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
… segfault (vllm-project#23692)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
… segfault (vllm-project#23692)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants