Skip to content

Conversation

@WoosukKwon
Copy link
Collaborator

No description provided.

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
@WoosukKwon WoosukKwon changed the title [FlashInfer] Fix potential race condition for paged_kv_indptr_cpu [BugFix][FlashInfer] Fix potential race condition for paged_kv_indptr_cpu Aug 27, 2025
@mergify mergify bot added the v1 label Aug 27, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly identifies and fixes a potential race condition in the FlashInfer backend related to paged_kv_indptr_cpu. The race condition could occur during asynchronous data transfers to the GPU, especially when CUDA graphs are enabled. The implemented solution, which involves using an intermediate buffer (paged_kv_indptr_buffer) for the asynchronous copy, is a standard and effective way to resolve this kind of issue. The changes also include a minor optimization for calculating the number of actual pages. Overall, the fix is well-implemented and enhances the robustness of the attention mechanism.

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
@WoosukKwon WoosukKwon merged commit 7ffbf27 into main Aug 28, 2025
6 of 9 checks passed
@WoosukKwon WoosukKwon deleted the woosuk/flashinfer-fix-race branch August 28, 2025 21:22
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
…_cpu (vllm-project#23737)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…_cpu (vllm-project#23737)

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants