Skip to content

Conversation

@ywang96
Copy link
Member

@ywang96 ywang96 commented Aug 25, 2025

Purpose

Shoutout to @LastZhabka for raising a bug from #22711 that it doesn't take into account when one request can have repeated images, which will result in a mismatch between logical space and physical space. Although this is not a common scenario, one could use this to attack a deployed server.

The solution is to simply add a temporary set to track the mm_hashes that will be scheduled in the current step.

This PR also does minor refactoring on where we actually do the accounting of space allocation and move it out of allocation decision time.

cc @fake0fan @knlnguyen1802

Test Plan

Test Result

(Optional) Documentation Update


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

fix
Signed-off-by: Roger Wang <hey@rogerw.me>
@mergify mergify bot added the v1 label Aug 25, 2025
Signed-off-by: Roger Wang <hey@rogerw.me>
@ywang96
Copy link
Member Author

ywang96 commented Aug 25, 2025

Will add a test case in a few hours...

Roger Wang and others added 6 commits August 25, 2025 11:56
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
@ywang96 ywang96 marked this pull request as ready for review August 25, 2025 20:33
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
@fake0fan
Copy link
Contributor

It looks like you replaced try_allocate with can_allocate in the scheduling logic.
Since can_allocate only checks if allocation is possible and does not actually allocate resources, I'm wondering whether we still need to add self.encoder_cache_manager.free(preempted_req) in scheduler.py. Does it make no difference whether we add or remove this line? Please let me know your thoughts.

Signed-off-by: Roger Wang <hey@rogerw.io>
@ywang96
Copy link
Member Author

ywang96 commented Aug 26, 2025

It looks like you replaced try_allocate with can_allocate in the scheduling logic. Since can_allocate only checks if allocation is possible and does not actually allocate resources, I'm wondering whether we still need to add self.encoder_cache_manager.free(preempted_req) in scheduler.py. Does it make no difference whether we add or remove this line? Please let me know your thoughts.

Discussed offline - self.encoder_cache_manager.free(preempted_req) is still needed since the preempted_req might be from a previous step that already occupies physical space, and calling free for a new request being preempted in the current step is a no-op anyways.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 26, 2025 07:07
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 26, 2025
@DarkLight1337 DarkLight1337 merged commit b5d34af into vllm-project:main Aug 26, 2025
36 checks passed
tc-mb pushed a commit to tc-mb/vllm that referenced this pull request Aug 27, 2025
…ject#23544)

Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
…ject#23544)

Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
…ject#23544)

Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
…ject#23544)

Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
…ject#23544)

Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…ject#23544)

Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants