Skip to content

Conversation

@njhill
Copy link
Member

@njhill njhill commented Aug 27, 2025

Also fixes issue with finished requests not being processed in async scheduling and PP cases, and possible worker race conditions.

Should resolve/replace:

Some possible follow-on work, out of scope for this PR:

  • Avoid busy cpu spin in worker response waiting thread
  • Single reactor loop for engine core - wait on input queue and worker response at the same time
  • Uniproc async scheduling for TP=1

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several fixes for async scheduling and pipeline parallelism with data parallelism. The main changes involve refactoring the batch queue handling in EngineCore from queue.Queue to collections.deque and adjusting the logic in step_with_batch_queue to better support pipelining. It also adds a fix in ray_utils for pipeline parallelism and extends tests for async scheduling. While the changes are mostly correct, I've found a critical issue in the updated step_with_batch_queue logic that could lead to incorrect behavior in data parallel mode.

@njhill njhill marked this pull request as draft August 28, 2025 00:17
Also fixes issue with finished requests not being processed in async scheduling and PP cases.

Signed-off-by: Nick Hill <nhill@redhat.com>
@njhill njhill marked this pull request as ready for review August 28, 2025 00:31
@njhill njhill marked this pull request as draft August 28, 2025 07:04
@njhill njhill marked this pull request as ready for review August 28, 2025 20:30
Signed-off-by: Nick Hill <nhill@redhat.com>
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 29, 2025
logger.info("Batch queue is enabled with size %d",
self.batch_queue_size)
self.batch_queue = queue.Queue(self.batch_queue_size)
self.batch_queue = deque(maxlen=self.batch_queue_size)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queue is accessed only from the core loop thread so does not need to be threadsafe/blocking.

@njhill
Copy link
Member Author

njhill commented Aug 29, 2025

@WoosukKwon I think this is ready to review. We're also testing it in wide-ep deployment.

Comment on lines +345 to +348
# Queue is empty. We should not reach here since this method should
# only be called when the scheduler contains requests or the queue
# is non-empty.
return None, False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean it's a bug if we reach this code? Should we do assert if that's the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, it's still a valid state and by returning this things stay correct/consistent. So it seems reasonable to keep it like this in terms of the method's behaviour.

The condition that means we shouldn't reach here is outside of this method (in the main core engine loop).

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Amazing!

@WoosukKwon WoosukKwon merged commit d90d8eb into vllm-project:main Aug 29, 2025
39 of 41 checks passed
@njhill njhill deleted the async-sched-dp branch August 29, 2025 15:34
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Sep 19, 2025
### What this PR does / why we need it?
based on the vllm-project/vllm#23770,
fix Async scheduling and PP compatibility with DP, also fixes issue with
finished requests not being processed in async scheduling and PP cases,
and possible worker race conditions.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@544fe76

---------

Signed-off-by: jesse <szxfml@gmail.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…2796)

### What this PR does / why we need it?
based on the vllm-project/vllm#23770,
fix Async scheduling and PP compatibility with DP, also fixes issue with
finished requests not being processed in async scheduling and PP cases,
and possible worker race conditions.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@544fe76

---------

Signed-off-by: jesse <szxfml@gmail.com>
hwhaokun pushed a commit to hwhaokun/vllm-ascend that referenced this pull request Nov 19, 2025
…2796)

### What this PR does / why we need it?
based on the vllm-project/vllm#23770,
fix Async scheduling and PP compatibility with DP, also fixes issue with
finished requests not being processed in async scheduling and PP cases,
and possible worker race conditions.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@544fe76

---------

Signed-off-by: jesse <szxfml@gmail.com>
Signed-off-by: hwhaokun <haokun0405@163.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
…2796)

### What this PR does / why we need it?
based on the vllm-project/vllm#23770,
fix Async scheduling and PP compatibility with DP, also fixes issue with
finished requests not being processed in async scheduling and PP cases,
and possible worker race conditions.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@544fe76

---------

Signed-off-by: jesse <szxfml@gmail.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 9, 2025
…2796)

### What this PR does / why we need it?
based on the vllm-project/vllm#23770,
fix Async scheduling and PP compatibility with DP, also fixes issue with
finished requests not being processed in async scheduling and PP cases,
and possible worker race conditions.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@544fe76

---------

Signed-off-by: jesse <szxfml@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants