-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
[BugFix] Async scheduling and PP compatibility with DP #23770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several fixes for async scheduling and pipeline parallelism with data parallelism. The main changes involve refactoring the batch queue handling in EngineCore from queue.Queue to collections.deque and adjusting the logic in step_with_batch_queue to better support pipelining. It also adds a fix in ray_utils for pipeline parallelism and extends tests for async scheduling. While the changes are mostly correct, I've found a critical issue in the updated step_with_batch_queue logic that could lead to incorrect behavior in data parallel mode.
Also fixes issue with finished requests not being processed in async scheduling and PP cases. Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
| logger.info("Batch queue is enabled with size %d", | ||
| self.batch_queue_size) | ||
| self.batch_queue = queue.Queue(self.batch_queue_size) | ||
| self.batch_queue = deque(maxlen=self.batch_queue_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The queue is accessed only from the core loop thread so does not need to be threadsafe/blocking.
|
@WoosukKwon I think this is ready to review. We're also testing it in wide-ep deployment. |
| # Queue is empty. We should not reach here since this method should | ||
| # only be called when the scheduler contains requests or the queue | ||
| # is non-empty. | ||
| return None, False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean it's a bug if we reach this code? Should we do assert if that's the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, it's still a valid state and by returning this things stay correct/consistent. So it seems reasonable to keep it like this in terms of the method's behaviour.
The condition that means we shouldn't reach here is outside of this method (in the main core engine loop).
WoosukKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Amazing!
…23770) Signed-off-by: Nick Hill <nhill@redhat.com>
### What this PR does / why we need it? based on the vllm-project/vllm#23770, fix Async scheduling and PP compatibility with DP, also fixes issue with finished requests not being processed in async scheduling and PP cases, and possible worker race conditions. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@544fe76 --------- Signed-off-by: jesse <szxfml@gmail.com>
…23770) Signed-off-by: Nick Hill <nhill@redhat.com>
…2796) ### What this PR does / why we need it? based on the vllm-project/vllm#23770, fix Async scheduling and PP compatibility with DP, also fixes issue with finished requests not being processed in async scheduling and PP cases, and possible worker race conditions. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@544fe76 --------- Signed-off-by: jesse <szxfml@gmail.com>
…2796) ### What this PR does / why we need it? based on the vllm-project/vllm#23770, fix Async scheduling and PP compatibility with DP, also fixes issue with finished requests not being processed in async scheduling and PP cases, and possible worker race conditions. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@544fe76 --------- Signed-off-by: jesse <szxfml@gmail.com> Signed-off-by: hwhaokun <haokun0405@163.com>
…2796) ### What this PR does / why we need it? based on the vllm-project/vllm#23770, fix Async scheduling and PP compatibility with DP, also fixes issue with finished requests not being processed in async scheduling and PP cases, and possible worker race conditions. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@544fe76 --------- Signed-off-by: jesse <szxfml@gmail.com> Signed-off-by: nsdie <yeyifan@huawei.com>
…2796) ### What this PR does / why we need it? based on the vllm-project/vllm#23770, fix Async scheduling and PP compatibility with DP, also fixes issue with finished requests not being processed in async scheduling and PP cases, and possible worker race conditions. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@544fe76 --------- Signed-off-by: jesse <szxfml@gmail.com>
Also fixes issue with finished requests not being processed in async scheduling and PP cases, and possible worker race conditions.
Should resolve/replace:
Some possible follow-on work, out of scope for this PR: