-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
[LoRA]: Add LoRA support to Mistral's Voxtral models #24517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds LoRA support for Mistral's Voxtral models, correctly scoping the adapters to the language model components. The changes are implemented by inheriting from the SupportsLoRA interface and providing a get_mm_mapping method to distinguish between the language model, connector, and audio tower. The implementation is clean, correct, and follows the established patterns in the vLLM codebase for enabling LoRA on multimodal models. The accompanying documentation update is also accurate. Overall, this is a solid contribution.
|
To test the implementation, use this test adapter for Script to spin VLLM serve: export VLLM_LOGGING_LEVEL=DEBUG
export VLLM_LOGGING_PREFIX="[vllm]"
MODEL="mistralai/Voxtral-Mini-3B-2507"
PORT=8000
GPU_MEM_UTIL=0.95
HOST="0.0.0.0"
TENSOR_PARALLEL_SIZE=1
MAX_MODEL_LEN=8192
# LoRA configuration
LORA0_NAME="lora0"
LORA0_PATH="/path/to/your/lora/adapter"
# ---------------------------------------------------------------------------
python -m vllm.entrypoints.openai.api_server \
--model "$MODEL" \
--tokenizer-mode mistral \
--config-format mistral \
--load-format mistral \
--host "$HOST" \
--port "$PORT" \
--gpu-memory-utilization "$GPU_MEM_UTIL" \
--tensor-parallel-size "$TENSOR_PARALLEL_SIZE" \
--max-model-len "$MAX_MODEL_LEN" \
--trust-remote-code \
--enable-lora \
--max-loras 1 \
--max-lora-rank 32 \
--lora-modules "$LORA0_NAME=$LORA0_PATH" \
--served-model-name "$MODEL" \
--compilation-config '{"level": 3, "cudagraph_mode": "PIECEWISE"}'Send request: import requests
import json
import base64
# Test the vLLM server endpoint
url = "http://localhost:8000/v1/chat/completions"
headers = {"Content-Type": "application/json"}
# Read and encode the audio file
audio_path = "/path/to/your/audio/file.wav"
with open(audio_path, "rb") as audio_file:
audio_data = base64.b64encode(audio_file.read()).decode('utf-8')
payload = {
"model": "lora0",
"messages": [
{"role": "system", "content": "Transcribe the following audio as it is."},
{"role": "user", "content": [
{
"type": "input_audio",
"input_audio": {
"data": audio_data,
"format": "wav"
}
}
]}
],
"max_tokens": 1024,
"temperature": 0.1
}
response = requests.post(url, headers=headers, json=payload, timeout=30)
result = response.json()
print(json.dumps(result, indent=2)) |
|
cc: @DarkLight1337, can you review and approve if it can be merged? |
|
Please fix the branch conflict firstly |
Signed-off-by: Yash Pratap Singh <yashsingh20001@gmail.com>
|
Conflicts resolved! |
jeejeelee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I assume you have tested this locally
Yes, I did. I also added scripts that I used to test LoRA support for reference. |
|
@DarkLight1337 most of the tests have passed, we can merge! |
Add LoRA support to Mistral's Voxtral models (
mistralai/Voxtral-Mini-3B-2507&mistralai/Voxtral-Small-24B-2507), scoped strictly to the language model components.Changes
SupportsLoRAinterface onVoxtralForConditionalGenerationget_mm_mapping()method to filter LoRA modules:docs/models/supported_models.mdto mark Voxtral as LoRA-supported (✅︎)Closes #24516