Skip to content

Conversation

@pratapyash
Copy link
Contributor

@pratapyash pratapyash commented Sep 9, 2025

Add LoRA support to Mistral's Voxtral models (mistralai/Voxtral-Mini-3B-2507 & mistralai/Voxtral-Small-24B-2507), scoped strictly to the language model components.

Changes

  • Implement SupportsLoRA interface on VoxtralForConditionalGeneration
  • Add get_mm_mapping() method to filter LoRA modules:
  • language_model: Text generation component (LoRA-enabled)
  • connector: audio_language_adapter
  • tower_model: whisper_encoder (excluded from LoRA)
  • Update docs/models/supported_models.md to mark Voxtral as LoRA-supported (✅︎)

Closes #24516

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds LoRA support for Mistral's Voxtral models, correctly scoping the adapters to the language model components. The changes are implemented by inheriting from the SupportsLoRA interface and providing a get_mm_mapping method to distinguish between the language model, connector, and audio tower. The implementation is clean, correct, and follows the established patterns in the vLLM codebase for enabling LoRA on multimodal models. The accompanying documentation update is also accurate. Overall, this is a solid contribution.

@mergify mergify bot added the documentation Improvements or additions to documentation label Sep 9, 2025
@pratapyash
Copy link
Contributor Author

pratapyash commented Sep 9, 2025

To test the implementation, use this test adapter for mistralai/Voxtral-Small-3B-2507: yashpratap/Voxtral-Small-3B-2507-generic-adapter

Script to spin VLLM serve:

export VLLM_LOGGING_LEVEL=DEBUG
export VLLM_LOGGING_PREFIX="[vllm]"

MODEL="mistralai/Voxtral-Mini-3B-2507"
PORT=8000
GPU_MEM_UTIL=0.95
HOST="0.0.0.0"
TENSOR_PARALLEL_SIZE=1
MAX_MODEL_LEN=8192

# LoRA configuration
LORA0_NAME="lora0"
LORA0_PATH="/path/to/your/lora/adapter"
# ---------------------------------------------------------------------------

python -m vllm.entrypoints.openai.api_server \
  --model "$MODEL" \
  --tokenizer-mode mistral \
  --config-format mistral \
  --load-format mistral \
  --host "$HOST" \
  --port "$PORT" \
  --gpu-memory-utilization "$GPU_MEM_UTIL" \
  --tensor-parallel-size "$TENSOR_PARALLEL_SIZE" \
  --max-model-len "$MAX_MODEL_LEN" \
  --trust-remote-code \
  --enable-lora \
  --max-loras 1 \
  --max-lora-rank 32 \
  --lora-modules "$LORA0_NAME=$LORA0_PATH" \
  --served-model-name "$MODEL" \
  --compilation-config '{"level": 3, "cudagraph_mode": "PIECEWISE"}'

Send request:

import requests
import json
import base64

# Test the vLLM server endpoint
url = "http://localhost:8000/v1/chat/completions"
headers = {"Content-Type": "application/json"}

# Read and encode the audio file
audio_path = "/path/to/your/audio/file.wav"
with open(audio_path, "rb") as audio_file:
    audio_data = base64.b64encode(audio_file.read()).decode('utf-8')

payload = {
    "model": "lora0",
    "messages": [
        {"role": "system", "content": "Transcribe the following audio as it is."},
        {"role": "user", "content": [
            {
                "type": "input_audio",
                "input_audio": {
                    "data": audio_data,
                    "format": "wav"
                }
            }
        ]}
    ],
    "max_tokens": 1024,
    "temperature": 0.1
}

response = requests.post(url, headers=headers, json=payload, timeout=30)
result = response.json()
print(json.dumps(result, indent=2))

@pratapyash
Copy link
Contributor Author

cc: @DarkLight1337, can you review and approve if it can be merged?

@jeejeelee
Copy link
Collaborator

Please fix the branch conflict firstly

Signed-off-by: Yash Pratap Singh <yashsingh20001@gmail.com>
@pratapyash
Copy link
Contributor Author

Conflicts resolved!

Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I assume you have tested this locally

@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 10, 2025
@jeejeelee jeejeelee enabled auto-merge (squash) September 10, 2025 10:28
@pratapyash
Copy link
Contributor Author

Thank you. I assume you have tested this locally

Yes, I did. I also added scripts that I used to test LoRA support for reference.

@pratapyash
Copy link
Contributor Author

pratapyash commented Sep 10, 2025

@DarkLight1337 most of the tests have passed, we can merge!

@vllm-bot vllm-bot merged commit 9e3c3a7 into vllm-project:main Sep 10, 2025
47 of 50 checks passed
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
)

Signed-off-by: Yash Pratap Singh <yashsingh20001@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
)

Signed-off-by: Yash Pratap Singh <yashsingh20001@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
)

Signed-off-by: Yash Pratap Singh <yashsingh20001@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
)

Signed-off-by: Yash Pratap Singh <yashsingh20001@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add LoRA adapter support for Mistral's Voxtral models

3 participants