fix(qwen_vl): add semaphore to serialize video decoding for thread safety #15506
+5
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit fixes a concurrency bug in the Qwen VL video processor where
multiple concurrent requests decoding different videos would cause crashes.
Problem
When multiple async requests process different videos simultaneously,
the decord library's VideoReader.get_batch() call fails with:
Error code -11 (EAGAIN) indicates resource contention in FFmpeg's
internal threaded decoder, which is not thread-safe for concurrent
multi-file access.
Root Cause
The decord library uses FFmpeg internally for video decoding. When
multiple VideoReader instances decode different video files at the same
time, FFmpeg's internal thread workers collide, causing the EAGAIN error.
This is a known issue with decord due to C package conflicts that are
difficult to resolve at the library level.
Solution
Add an asyncio.Semaphore(1) to serialize the get_batch() call, ensuring
only one video decoding operation runs at a time.
Note: This is a Quick Fix
This is a minimal fix to quickly resolve the crash. The semaphore
serializes video decoding, which may slightly reduce throughput when
processing multiple different videos concurrently.
Future Improvement: OpenCV-based Decoding
A more comprehensive solution would be to use OpenCV (cv2.VideoCapture)
as the primary video decoder with decord as a fallback. This approach
is being adopted by QwenLM/Qwen3-VL in PR #1078, which:
To implement this in SGLang would require:
Modify
load_video()insglang/srt/utils/common.py:Modify
preprocess_video()inqwen_vl.py:def _decode_with_opencv(video_path: str, frame_indices: np.ndarray) -> np.ndarray:
cap = cv2.VideoCapture(video_path)
frames = []
for idx in frame_indices:
cap.set(cv2.CAP_PROP_POS_FRAMES, int(idx))
ret, frame = cap.read()
if ret:
frames.append(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
cap.release()
return np.stack(frames)
This would allow full parallel video decoding while maintaining reliability.
Happy to submit a follow-up PR for the OpenCV implementation.
References
get_batchhang in some condition. I can provide the reproduce repo. dmlc/decord#269 (get_batch hang issue)