[Bugfix] Fix scheduling when repeated images in one request #23544

ywang96 · 2025-08-25T11:06:33Z

Purpose

Shoutout to @LastZhabka for raising a bug from #22711 that it doesn't take into account when one request can have repeated images, which will result in a mismatch between logical space and physical space. Although this is not a common scenario, one could use this to attack a deployed server.

The solution is to simply add a temporary set to track the mm_hashes that will be scheduled in the current step.

This PR also does minor refactoring on where we actually do the accounting of space allocation and move it out of allocation decision time.

cc @fake0fan @knlnguyen1802

Test Plan

Test Result

(Optional) Documentation Update

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Roger Wang <hey@rogerw.me>

ywang96 · 2025-08-25T11:08:44Z

Will add a test case in a few hours...

Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.me>

Signed-off-by: Roger Wang <hey@rogerw.me>

Signed-off-by: Roger Wang <hey@rogerw.io>

vllm/v1/core/encoder_cache_manager.py

fake0fan · 2025-08-26T03:06:36Z

It looks like you replaced try_allocate with can_allocate in the scheduling logic.
Since can_allocate only checks if allocation is possible and does not actually allocate resources, I'm wondering whether we still need to add self.encoder_cache_manager.free(preempted_req) in scheduler.py. Does it make no difference whether we add or remove this line? Please let me know your thoughts.

Signed-off-by: Roger Wang <hey@rogerw.io>

ywang96 · 2025-08-26T06:29:06Z

It looks like you replaced try_allocate with can_allocate in the scheduling logic. Since can_allocate only checks if allocation is possible and does not actually allocate resources, I'm wondering whether we still need to add self.encoder_cache_manager.free(preempted_req) in scheduler.py. Does it make no difference whether we add or remove this line? Please let me know your thoughts.

Discussed offline - self.encoder_cache_manager.free(preempted_req) is still needed since the preempted_req might be from a previous step that already occupies physical space, and calling free for a new request being preempted in the current step is a no-op anyways.

…ject#23544) Signed-off-by: Roger Wang <hey@rogerw.me> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.me> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: tc-mb <caitianchi@modelbest.cn>

…ject#23544) Signed-off-by: Roger Wang <hey@rogerw.me> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.me> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>

…ject#23544) Signed-off-by: Roger Wang <hey@rogerw.me> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.me> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

…ject#23544) Signed-off-by: Roger Wang <hey@rogerw.me> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.me> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>

fix

13a3678

Signed-off-by: Roger Wang <hey@rogerw.me>

mergify bot added the v1 label Aug 25, 2025

clarify

de6aa1e

Signed-off-by: Roger Wang <hey@rogerw.me>

knlnguyen1802 mentioned this pull request Aug 25, 2025

Refactor encoder cache allocate & Fix #23544 #23551

Closed

4 tasks

Roger Wang and others added 6 commits August 25, 2025 11:56

minor refactor

00e0b29

Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.me>

add test

45470b7

Signed-off-by: Roger Wang <hey@rogerw.me>

update tests

675f124

Signed-off-by: Roger Wang <hey@rogerw.io>

clarify

4f7b6ae

Signed-off-by: Roger Wang <hey@rogerw.io>

modify

e1aac04

Signed-off-by: Roger Wang <hey@rogerw.io>

Merge branch 'main' into fix-encoder-scheduling-repeated

f91aa14

ywang96 marked this pull request as ready for review August 25, 2025 20:33

ywang96 requested review from WoosukKwon, alexm-redhat, comaniac, njhill and robertgshaw2-redhat as code owners August 25, 2025 20:33

clarify

6de19ab

Signed-off-by: Roger Wang <hey@rogerw.io>

ywang96 requested review from DarkLight1337 and removed request for WoosukKwon, alexm-redhat, comaniac, njhill and robertgshaw2-redhat August 25, 2025 20:38

clarify

33ab78f

Signed-off-by: Roger Wang <hey@rogerw.io>

DarkLight1337 reviewed Aug 26, 2025

View reviewed changes

vllm/v1/core/encoder_cache_manager.py Outdated Show resolved Hide resolved

typo

0f97ed0

Signed-off-by: Roger Wang <hey@rogerw.io>

Isotr0py approved these changes Aug 26, 2025

View reviewed changes

DarkLight1337 approved these changes Aug 26, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) August 26, 2025 07:07

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 26, 2025

Merge branch 'main' into fix-encoder-scheduling-repeated

3e1a10a

DarkLight1337 merged commit b5d34af into vllm-project:main Aug 26, 2025
36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix scheduling when repeated images in one request #23544

[Bugfix] Fix scheduling when repeated images in one request #23544

Uh oh!

ywang96 commented Aug 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

ywang96 commented Aug 25, 2025

Uh oh!

Uh oh!

fake0fan commented Aug 26, 2025

Uh oh!

ywang96 commented Aug 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Bugfix] Fix scheduling when repeated images in one request #23544

[Bugfix] Fix scheduling when repeated images in one request #23544

Uh oh!

Conversation

ywang96 commented Aug 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

ywang96 commented Aug 25, 2025

Uh oh!

Uh oh!

fake0fan commented Aug 26, 2025

Uh oh!

ywang96 commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ywang96 commented Aug 25, 2025 •

edited by github-actions bot

Loading

ywang96 commented Aug 26, 2025 •

edited

Loading