Skip to content

Conversation

@ywang96
Copy link
Member

@ywang96 ywang96 commented Aug 22, 2025

Purpose

This PR is a follow-up to #23308 and #22711. Since multimodal hashes will now be required for the upcoming reworked multimodal encoder cache, we want to allow users to be able to pass in their own multimodal identifiers in case hashing tensors results in non-negligible overhead.

This PR allows multi_modal_uuids to be passed to AsyncLLM. The structure of multi_modal_uuids matches multi_modal_data. An entry is either a string or None (in this case, we fall back to the default hashing logic to compute mm_hash for the item)

Follow-up: Allow passing uuid from ChatMessages which will be supported by #23449.

Partially resolves: #22044

Usage

from vllm import LLM

llm = LLM(...)

# Refer to the HuggingFace repo for the correct format to use
prompt = "USER: <image>\nWhat is the content of this image?\nASSISTANT:"

# Load the image using PIL.Image
image = PIL.Image.open(...)

# Single prompt inference
outputs = llm.generate({
    "prompt": prompt,
    "multi_modal_data": {"image": image},
    "multi_modal_uuids": {"image": "placeholder_hash"},
})

# Multi-image inference
image_1 = PIL.Image.open(...)
image_2 = PIL.Image.open(...)
outputs = llm.generate(
       {
            "prompt": "USER: <image>\nWhat is the content of this image?\nASSISTANT:",
            "multi_modal_data": {"image": [image_1, image_2]},
            # `None` uuid will be replaced by default hash based on content
            "multi_modal_uuids": {"image": [None, "placeholder_hash_2"]},
        }
)

Test Plan

pytest v1/engine/test_processor_multi_modal_uuids.py

Test Result

============================================================== test session starts ===============================================================
platform linux -- Python 3.12.3, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/ubuntu/ywang96/ywang-vllm
configfile: pyproject.toml
plugins: forked-1.6.0, timeout-2.3.1, asyncio-0.24.0, schemathesis-3.39.15, buildkite-test-collector-0.1.9, hypothesis-6.131.0, shard-0.1.2, anyio-4.6.2.post1, rerunfailures-14.0, mock-3.14.0, subtests-0.14.1, hydra-core-1.3.2
asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 6 items                                                                                                                                
Running 6 items in this shard

v1/engine/test_processor_multi_modal_uuids.py ......                                                                                       [100%]

================================================================ warnings summary ================================================================
../.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /home/ubuntu/ywang96/ywang-vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================== 6 passed, 1 warning in 1.69s ==========================================================

(Optional) Documentation Update


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Roger Wang <hey@rogerw.io>
@mergify mergify bot added frontend multi-modality Related to multi-modality (#4194) labels Aug 22, 2025
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
@ywang96 ywang96 changed the title [Core][Multimodal] Allow passing multi_modal_ids and uuid has custom multimodal identifiers [Core][Multimodal] Allow passing multi_modal_ids as multimodal identifiers. Aug 24, 2025
@mergify mergify bot added the v1 label Aug 24, 2025
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
logger = init_logger(__name__)

MultiModalHashDict = Mapping[str, list[str]]
MultiModalHashDict = dict[str, list[Optional[str]]]
Copy link
Contributor

@huachenheli huachenheli Aug 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my other comment on hashing uuid, we might need to have

MultiModalUUIDDict = dict[str, list[Optional[str]]]
MultiModalHashDict = dict[str, list[Optional[str]]]

separately so they represent raw user uuids and hash values. Even though they are the same string types, we should better separate their semantics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 977811b

are allowed and will be auto-hashed downstream.
"""

def _validate_single(single_prompt: Union[dict, str]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also validate that if an entry in multi_modal_data[modality] is None, its corresponding uuid must not be None.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today we don't allow users to pass in a None multi_modal_data item.

I see the point is about passing a uuid for an existing cached object so no data transmission for the object itself is required, but I think that's outside the scope of this PR and will defer to a later one.

Roger Wang added 4 commits August 24, 2025 23:56
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.me>
@DarkLight1337
Copy link
Member

After #23018 is merged, we should validate on P0 that if the user passes in UUID, the corresponding item should exist in the cache, and reject the request otherwise, in order to avoid crashing the engine in P1.

Roger Wang added 2 commits August 25, 2025 02:43
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.me>
@ywang96 ywang96 changed the title [Core][Multimodal] Allow passing multi_modal_ids as multimodal identifiers. [Core][Multimodal] Allow passing multi_modal_uuids as multimodal identifiers. Aug 25, 2025
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
@ywang96 ywang96 requested a review from hmellor as a code owner August 29, 2025 07:52
@mergify mergify bot added the documentation Improvements or additions to documentation label Aug 29, 2025
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
@ywang96 ywang96 requested a review from aarnphm as a code owner August 29, 2025 08:24
@ywang96 ywang96 removed the request for review from aarnphm August 29, 2025 08:24
@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 29, 2025
@ywang96 ywang96 merged commit 749be00 into vllm-project:main Aug 31, 2025
40 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Multi-modality Core Aug 31, 2025
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[RFC]: Optimize Input Media Processing in vLLM

3 participants