[Core][Multimodal] Allow passing `multi_modal_uuids` as multimodal identifiers. #23394

ywang96 · 2025-08-22T02:23:23Z

Purpose

This PR is a follow-up to #23308 and #22711. Since multimodal hashes will now be required for the upcoming reworked multimodal encoder cache, we want to allow users to be able to pass in their own multimodal identifiers in case hashing tensors results in non-negligible overhead.

This PR allows multi_modal_uuids to be passed to AsyncLLM. The structure of multi_modal_uuids matches multi_modal_data. An entry is either a string or None (in this case, we fall back to the default hashing logic to compute mm_hash for the item)

Follow-up: Allow passing uuid from ChatMessages which will be supported by #23449.

Partially resolves: #22044

Usage

from vllm import LLM

llm = LLM(...)

# Refer to the HuggingFace repo for the correct format to use
prompt = "USER: <image>\nWhat is the content of this image?\nASSISTANT:"

# Load the image using PIL.Image
image = PIL.Image.open(...)

# Single prompt inference
outputs = llm.generate({
    "prompt": prompt,
    "multi_modal_data": {"image": image},
    "multi_modal_uuids": {"image": "placeholder_hash"},
})

# Multi-image inference
image_1 = PIL.Image.open(...)
image_2 = PIL.Image.open(...)
outputs = llm.generate(
       {
            "prompt": "USER: <image>\nWhat is the content of this image?\nASSISTANT:",
            "multi_modal_data": {"image": [image_1, image_2]},
            # `None` uuid will be replaced by default hash based on content
            "multi_modal_uuids": {"image": [None, "placeholder_hash_2"]},
        }
)

Test Plan

pytest v1/engine/test_processor_multi_modal_uuids.py

Test Result

============================================================== test session starts ===============================================================
platform linux -- Python 3.12.3, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/ubuntu/ywang96/ywang-vllm
configfile: pyproject.toml
plugins: forked-1.6.0, timeout-2.3.1, asyncio-0.24.0, schemathesis-3.39.15, buildkite-test-collector-0.1.9, hypothesis-6.131.0, shard-0.1.2, anyio-4.6.2.post1, rerunfailures-14.0, mock-3.14.0, subtests-0.14.1, hydra-core-1.3.2
asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 6 items                                                                                                                                
Running 6 items in this shard

v1/engine/test_processor_multi_modal_uuids.py ......                                                                                       [100%]

================================================================ warnings summary ================================================================
../.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /home/ubuntu/ywang96/ywang-vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================== 6 passed, 1 warning in 1.69s ==========================================================

(Optional) Documentation Update

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Roger Wang <hey@rogerw.io>

vllm/inputs/data.py

vllm/inputs/preprocess.py

vllm/multimodal/processing.py

huachenheli · 2025-08-24T20:10:42Z

vllm/multimodal/hasher.py

 logger = init_logger(__name__)

-MultiModalHashDict = Mapping[str, list[str]]
+MultiModalHashDict = dict[str, list[Optional[str]]]


Based on my other comment on hashing uuid, we might need to have

MultiModalUUIDDict = dict[str, list[Optional[str]]] MultiModalHashDict = dict[str, list[Optional[str]]]

separately so they represent raw user uuids and hash values. Even though they are the same string types, we should better separate their semantics.

addressed in 977811b

huachenheli · 2025-08-24T20:14:58Z

vllm/v1/engine/processor.py

+        are allowed and will be auto-hashed downstream.
+        """
+
+        def _validate_single(single_prompt: Union[dict, str]) -> None:


We should also validate that if an entry in multi_modal_data[modality] is None, its corresponding uuid must not be None.

Today we don't allow users to pass in a None multi_modal_data item.

I see the point is about passing a uuid for an existing cached object so no data transmission for the object itself is required, but I think that's outside the scope of this PR and will defer to a later one.

Signed-off-by: Roger Wang <hey@rogerw.me>

DarkLight1337 · 2025-08-25T08:45:38Z

After #23018 is merged, we should validate on P0 that if the user passes in UUID, the corresponding item should exist in the cache, and reject the request otherwise, in order to avoid crashing the engine in P1.

Signed-off-by: Roger Wang <hey@rogerw.me>

Signed-off-by: Roger Wang <hey@rogerw.io>

…entifiers. (vllm-project#23394) Signed-off-by: Roger Wang <hey@rogerw.io>

add

118aae6

Signed-off-by: Roger Wang <hey@rogerw.io>

mergify bot added frontend multi-modality Related to multi-modality (#4194) labels Aug 22, 2025

ywang96 added 2 commits August 22, 2025 02:29

typo

3b92bcf

Signed-off-by: Roger Wang <hey@rogerw.io>

fix import

53dd7c7

Signed-off-by: Roger Wang <hey@rogerw.io>

DarkLight1337 mentioned this pull request Aug 23, 2025

[Frontend] User-provided uuids for medias in chat. (RFC #22044) #23449

Merged

4 tasks

ywang96 added 4 commits August 24, 2025 11:52

Merge branch 'main' into allow-passing-mm-hash

57e114d

revert

cea5c09

Signed-off-by: Roger Wang <hey@rogerw.io>

revert

611827f

Signed-off-by: Roger Wang <hey@rogerw.io>

allow missing entry

1f31339

Signed-off-by: Roger Wang <hey@rogerw.io>

ywang96 changed the title ~~[Core][Multimodal] Allow passing multi_modal_ids and uuid has custom multimodal identifiers~~ [Core][Multimodal] Allow passing multi_modal_ids as multimodal identifiers. Aug 24, 2025

mergify bot added the v1 label Aug 24, 2025

ywang96 added 2 commits August 24, 2025 14:03

update

1756606

Signed-off-by: Roger Wang <hey@rogerw.io>

update typing

a82a865

Signed-off-by: Roger Wang <hey@rogerw.io>