Refactor mm processors and Enable mixed modality processing #7629

JustinTong0323 · 2025-06-29T07:13:54Z

Motivation

Refactors the multimodal processor to support mixed modalities (images and audio) in the same request.

It introduces a unified processing flow that handles different input types, reducing code duplication and improving maintainability. The new design allows creating multimodal data items directly from processor outputs, streamlining the data handling process.

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

Refactors the multimodal processor to support mixed modalities (images and audio) in the same request. It introduces a unified processing flow that handles different input types, reducing code duplication and improving maintainability. The new design allows creating multimodal data items directly from processor outputs, streamlining the data handling process. Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

mickqian · 2025-06-29T08:00:03Z

python/sglang/srt/models/gemma3n_mm.py

+            media_token_pairs.append((mm_inputs.im_start_id, mm_inputs.im_end_id))
+        if mm_inputs.audio_start_id is not None:
+            media_token_pairs.append((mm_inputs.audio_start_id, mm_inputs.audio_end_id))
+        print(f"DEBUG: gemma3n_mm.pad_input_ids: {media_token_pairs=}")


We should consider manage all these special tokens into one MultimodalSpecialTokens. When building it, chat template of this model can be useful, as it already contains some special tokens.

mickqian · 2025-06-29T08:00:55Z

python/sglang/srt/multimodal/processors/base_processor.py

+            "input_features": Modality.AUDIO,
+            "input_features_mask": Modality.AUDIO,
+            # Video-related attributes
+            "video_grid_thws": Modality.VIDEO,


Maybe it's more appropriate if we pass this dict model-specifically?

we should simplify this, only real data attributes are needed

I think maybe moving the mapping to Subclass is a good solution?

We can keep the logic in parent class, while letting subclass decides the actual modality by providing a mapping. This big dict works well in most scenarios, but still it's a bit hack.

Need to think of a better solution 🤔 Maybe we should first refactor the MMItem to avoid directly storing the additional attributes (processor outputs)

mickqian · 2025-06-29T08:02:20Z

python/sglang/srt/multimodal/processors/base_processor.py

+                    items[modality] = MultimodalDataItem(modality=modality)
+
+                # Set attribute
+                if hasattr(items[modality], attr_name):


we can skip this judgement?

python/sglang/srt/multimodal/processors/base_processor.py

mickqian · 2025-06-29T08:08:39Z

python/sglang/srt/multimodal/processors/base_processor.py

+        """Create mm_items directly from processor output."""
+        items = {}  # modality -> MultimodalDataItem
+
+        for attr_name, value in data_dict.items():


As above, decide the modality from whether the real data attribute is presented in this dict

python/sglang/srt/multimodal/processors/base_processor.py

Refactors the multimodal token padding logic to directly replace specific multimodal tokens (image, audio, video) with their corresponding padding values. This change replaces the previous approach of identifying contiguous regions based on start and end tokens with a more straightforward method of replacing individual tokens. It streamlines the padding process and avoids potential issues with mismatched region counts. Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

Simplifies multimodal processor logic by ensuring image and audio data are consistently handled as lists. Removes redundant checks for empty or single-element inputs within individual processor implementations, promoting code reuse and maintainability. Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

Refactors the VILA multimodal processing logic for better organization and efficiency. It streamlines the data flow and adds explicit handling of image and video token IDs for improved clarity and future extensibility. Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

Renames `_create_mm_items_from_dict` to `collect_mm_items_from_processor_output` and `_process_and_create_mm_items` to `_process_and_collect_mm_items` to improve code readability and reflect their function more accurately. Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

Ensures 'audio' is only added to kwargs for Gemma3n processors. This resolves an issue where the 'audio' keyword argument was being incorrectly added to the keyword arguments for other multimodal processors, causing compatibility issues. Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

JustinTong0323 · 2025-06-29T23:09:08Z

Current mm data flow in processor is shown as:

graph TD
    subgraph "Phase 1: Loading Data"
        direction LR
        Start((Start)) --> UserInput["User Input <br> (prompt, image_data, audio_data)"];
        UserInput --> LoadMMData["load_mm_data"];
        LoadMMData --> SplitPrompt["Split prompt by special tokens <br> e.g., <image>"];
        SplitPrompt --> SubmitTasks["submit_data_loading_tasks"];
        SubmitTasks --> ParallelLoad["Parallel Loading (IO Thread Pool) <br> _load_single_item loads each file"];
        ParallelLoad --> ReconstructText["Reconstruct text prompt & <br> Collect loaded raw data (PIL Images, etc.)"];
        ReconstructText --> BaseOutput["BaseMultiModalProcessorOutput <br> (text with special tokens, raw images, raw audios)"];
    end

    subgraph "Phase 2: Processing and Combining"
        direction LR
        BaseOutput --> ProcessAndCombine["process_and_combine_mm_data"];
        ProcessAndCombine --> CategorizeData{"Categorize data <br> pre-processed vs raw"};
        CategorizeData -- "Raw data" --> HFProcessor["process_mm_data <br> (Calls HuggingFace processor)"];
        HFProcessor --> ProcessorOutput["Processor Output <br> (input_ids, pixel_values, etc.)"];
        CategorizeData -- "Pre-processed data" --> CollectItems["collect_mm_items_from_processor_output"];
        ProcessorOutput --> CollectItems;
        CollectItems --> CreateMMItems["Create MultimodalDataItem objects <br> with processed features"];
        CreateMMItems --> AddOffsets["Find special token positions in input_ids <br> and add offsets to MultimodalDataItems"];
        AddOffsets --> FinalOutput([Final Output <br> List of MultimodalDataItem, <br> input_ids tensor]);
        FinalOutput --> End((End));
    end

Ensures `Modality.from_str` is case-insensitive and raises a ValueError with a helpful message when an invalid modality string is provided. This improves the user experience by providing more informative error messages. Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

Replaces checks `if images is not None` with `if images` for conciseness. This change ensures that multimodal data processing logic correctly handles empty lists/None values for images, videos, and audios. Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

JustinTong0323 · 2025-06-30T00:44:45Z

/gemini summary

gemini-code-assist · 2025-06-30T00:45:55Z

Summary of Changes

This pull request introduces a significant refactoring of the multimodal processing pipeline to enable seamless handling of mixed modalities, such as images and audio, within a single request. The changes aim to unify the processing flow, reduce code duplication, and improve the overall maintainability and extensibility of the multimodal components by centralizing data handling and token padding logic.

Highlights

Unified Multimodal Processing: The core multimodal processing logic in base_processor.py has been significantly refactored to support mixed modalities (images and audio) within a single request. This includes a new approach to collecting and combining MultimodalDataItem objects directly from processor outputs, streamlining the data handling process.
Simplified Token Padding: The pad_input_tokens utility in mm_utils.py has been redesigned. It now directly replaces multimodal tokens based on their specific token IDs (image, audio, video) using a mapping, simplifying the previous logic that relied on finding contiguous regions of tokens.
Generalized Multimodal Data Handling: The tokenizer manager (tokenizer_manager.py) has been updated to pass audio_data alongside image_data to the multimodal processor. Additionally, generic image_inputs references have been updated to mm_inputs across the codebase, reflecting the broader support for various multimodal inputs.
Improved Modality Mapping: A new ATTR_NAME_TO_MODALITY mapping has been introduced in the base processor (base_processor.py). This mapping dynamically determines the modality type from processor output attributes, enabling more flexible and extensible creation of multimodal data items.
Streamlined Model-Specific Processors: Numerous model-specific multimodal processors (e.g., Gemma3, Gemma3n, Kimi-VL, Qwen-VL, VILA) have been updated. They now leverage the new unified processing flow, reducing redundant logic and preparing them for mixed modality support by adapting to the new mm_items list return type.

Changelog

python/sglang/srt/managers/mm_utils.py
- Refactored pad_input_tokens to use a token-to-pad mapping for different modalities, simplifying the logic for replacing multimodal tokens.
- Updated the padding pattern class from MultiModalityDataPaddingPatternTokenPairs to MultiModalityDataPaddingPatternMultimodalTokens.
python/sglang/srt/managers/schedule_batch.py
- Added a static method Modality.from_str for robust conversion of string to Modality enum, including error handling.
python/sglang/srt/managers/tokenizer_manager.py
- Modified _tokenize_one_request to accept and process audio_data alongside image_data.
- Generalized image_inputs to mm_inputs in _tokenize_one_request and _create_tokenized_object for broader multimodal support.
- Updated _validate_batch_tokenization_constraints to check for generic multimodal input instead of just image data.
python/sglang/srt/models/gemma3n_mm.py
- Updated import and usage of the multimodal padding pattern to the new MultiModalityDataPaddingPatternMultimodalTokens.
- Simplified the pad_input_ids function by removing explicit media token pair collection.
python/sglang/srt/multimodal/processors/base_processor.py
- Removed the MultimodalInputFormat enum.
- Introduced ATTR_NAME_TO_MODALITY mapping to dynamically associate processor output attributes with their respective modalities.
- Updated process_mm_data and process_mm_data_async to handle audio data.
- Completely refactored process_and_combine_mm_data to support mixed modalities, using new helper methods collect_mm_items_from_processor_output and _process_and_collect_mm_items.
python/sglang/srt/multimodal/processors/clip.py
- Simplified process_mm_data_async by removing redundant checks for empty or non-list image_data.
python/sglang/srt/multimodal/processors/deepseek_vl_v2.py
- Simplified process_mm_data_async by removing redundant checks for empty or non-list image_data.
python/sglang/srt/multimodal/processors/gemma3.py
- Simplified process_mm_data_async by removing redundant checks for empty or non-list image_data.
- Updated return value from a single combined_mm_item to a list of mm_items.
python/sglang/srt/multimodal/processors/gemma3n.py
- Simplified process_mm_data_async by removing redundant checks for empty or non-list image_data and audio_data.
- Updated return value from a single combined_mm_item to a list of mm_items.
- Changed token ID keys from im_start_id/audio_start_id to im_token_id/audio_token_id.
python/sglang/srt/multimodal/processors/internvl.py
- Simplified process_mm_data_async by removing redundant checks for empty or non-list image_data.
python/sglang/srt/multimodal/processors/janus_pro.py
- Simplified process_mm_data_async by removing redundant checks for empty or non-list image_data.
python/sglang/srt/multimodal/processors/kimi_vl.py
- Simplified process_mm_data_async by removing redundant checks for empty or non-list image_data.
- Updated return value from a single combined_mm_item to a list of mm_items.
python/sglang/srt/multimodal/processors/llava.py
- Simplified process_mm_data_async by removing redundant checks for empty or non-list image_data.
python/sglang/srt/multimodal/processors/minicpm.py
- Updated process_mm_data_async to explicitly accept audio_data and removed redundant checks for empty or non-list image_data/audio_data.
python/sglang/srt/multimodal/processors/mlama.py
- Simplified process_mm_data_async by removing redundant checks for empty or non-list image_data.
python/sglang/srt/multimodal/processors/mllama4.py
- Simplified process_mm_data_async by removing redundant checks for empty or non-list image_data.
python/sglang/srt/multimodal/processors/phi4mm.py
- Updated process_mm_data_async to explicitly accept audio_data and removed redundant checks for empty or non-list image_data/audio_data.
python/sglang/srt/multimodal/processors/pixtral.py
- Simplified process_mm_data_async by removing redundant checks for empty or non-list image_data.
python/sglang/srt/multimodal/processors/qwen_vl.py
- Simplified process_mm_data_async by removing redundant checks for empty or non-list image_data.
- Updated return value from a single combined_mm_item to a list of mm_items.
- Added a note/FIXME about only processing the first item in mm_items.
python/sglang/srt/multimodal/processors/vila.py
- Initialized IM_TOKEN_ID and VIDEO_TOKEN_ID in __init__.
- Simplified process_mm_data_async by removing redundant checks for empty or non-list image_data.
- Updated return value from a single combined_mm_item to a list of mm_items and included im_token_id/video_token_id.

Activity

JustinTong0323 provided a detailed Mermaid diagram illustrating the new two-phase multimodal data processing flow (Loading Data and Processing & Combining).
gemini-code-assist[bot] raised a high-priority concern regarding Modality.from_str in schedule_batch.py, noting a mismatch between the KeyError it raises and the ValueError expected by calling code, suggesting a fix to re-raise as ValueError and handle case insensitivity.
gemini-code-assist[bot] identified a medium-priority issue with a leftover debug print statement that should be removed.
mickqian suggested managing all special tokens into a single MultimodalSpecialTokens class, potentially leveraging chat templates.
mickqian suggested that the ATTR_NAME_TO_MODALITY mapping in base_processor.py might be more appropriate if passed model-specifically or simplified to only include real data attributes. JustinTong0323 acknowledged this, suggesting moving the mapping to subclasses or refactoring MMItem.
mickqian suggested skipping a judgment in base_processor.py.
mickqian suggested renaming collect_mm_items_from_processor_output and _process_and_collect_mm_items for clarity.
mickqian noted a TODO for the modality of precomputed features to be passed along with data in the future.
gemini-code-assist[bot] raised a critical concern in mm_utils.py's pad_input_tokens, indicating a regression where token_to_pad_mapping incorrectly handles multiple multimodal items of the same modality by overwriting pad_value. JustinTong0323 responded by stating the assumption that only one mm_item is expected per modality after combining.
gemini-code-assist[bot] highlighted a high-priority issue in gemma3n_mm.py and qwen_vl.py where the reliance on the flawed pad_input_tokens (mentioned above) could lead to incorrect behavior.
gemini-code-assist[bot] identified a high-priority regression in qwen_vl.py, where the code now only processes the first multimodal item (mm_items[0]), ignoring others, which limits the model's capability for multiple images.

Refactors the multimodal token padding mechanism by removing the need to explicitly pass token IDs to the `MultiModalityDataPaddingPatternMultimodalTokens` class. This change streamlines the padding process and makes it more consistent across different models. Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

…ect#7629) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

@mickqian

* Use seq_len_fill_value in the cuda graph runners (sgl-project#7233) * support custom weight loader for model runner (sgl-project#7122) Co-authored-by: kavioyu <kavioyu@tencent.com> * Fix AMD speculative decoding (sgl-project#7252) * [Refactor] OAI Server components (sgl-project#7167) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> * OAI Server Skeleton & Core Utility Endpoints (sgl-project#7179) * [amd] Opt dsv3 moe (sgl-project#7160) Co-authored-by: wunhuang <wunhuang@amd.com> * update ci node for xeon (sgl-project#7265) * feat: mtp support dp-attention (sgl-project#6081) Co-authored-by: austindeng <austindeng@tencent.com> Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com> Co-authored-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: ch-wan <cwan39@gatech.edu> * support qwen2 running on ascend npu device (sgl-project#7022) Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com> * Fix Deepseek R1 0528 FP4 tensor name mismatch issue during weights loading. (sgl-project#7164) * bugfix(tool call ebnf): Fix EBNF generation for optional function parameters (sgl-project#7283) * Fix AWQ Dequant and Weight Loading of deepseek v2 (sgl-project#6842) * fix: resolve b200 dsv3 mtp issue (sgl-project#7286) * ci: Fix test_ebnf_generate_all_optional_function_params (sgl-project#7288) * fix: only enable flash_attn test on sm80 sm90 (sgl-project#7289) * [PD] Support get local ip from NIC for PD disaggregation (sgl-project#7237) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * [PD] Add custom memory pool option to support Mooncake PD with NVLink (sgl-project#7264) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * Upstreaming hicache bug fixes (sgl-project#7267) * Update python API of activation, topk, norm and rope and remove vllm dependency (sgl-project#6614) Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com> Co-authored-by: jianan-gu <jianan.gu@intel.com> Co-authored-by: sdp <sdp@gnr799219.jf.intel.com> * Fix hicache benchmark script bug - some sampled input_request is [] (sgl-project#7300) * chore: change logs from`INFO` to `DEBUG` for dp and add force quit for tokenizer manager (sgl-project#7251) * update invalid link in doc (sgl-project#7297) * Fix mini_lb for PD with long output: limit chunk size of decode response (sgl-project#7301) Signed-off-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com> Co-authored-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com> * Fix profiler error when there are idle passes (sgl-project#7003) * [pd] optimize dockerfile for pd disaggregation (sgl-project#7319) Co-authored-by: zhyncs <me@zhyncs.com> * Merge PDLB (Prefill-Decode Load Balancer) into SGLang Router (sgl-project#7096) * Add more refactored openai test & in CI (sgl-project#7284) * fix: resolve blackwell deepep image issue (sgl-project#7331) * add seed in CPU UTs to avoid flaky failure (sgl-project#7333) * Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately (sgl-project#7099) * Reintroduce tiny fix sampler error when prob is not contiguous (sgl-project#7354) * [Refactor] Clean up radix cache related API (sgl-project#7303) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> * Put `_normalize_rid` before other normalization in `io_struct` (sgl-project#7363) * [PD] Transfer hidden states for mtp when disaggregation (sgl-project#7242) * [Bugfix][PD] Set conclude state before clear when failure happens (sgl-project#7362) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * docs: update installation (sgl-project#7366) * [Docker] optimize dockerfile remove deepep and blackwell merge it to… (sgl-project#7343) Co-authored-by: Yineng Zhang <me@zhyncs.com> * Clean unused import for mimo mtp model (sgl-project#7370) * [Bugfix]Fix hang bug using dp attention with HiRadixCache (sgl-project#7159) Signed-off-by: huanglong <huanglong@linux.alibaba.com> * [Doc] add embedding rerank doc (sgl-project#7364) * Fix judgment condition for enabling Deepseek V3/R1 shared expert fusion optimization (sgl-project#7371) * Feat/refactor embedding server (sgl-project#7322) * Purge VerlEngine (sgl-project#7326) Signed-off-by: Ata Fatahi <immrata@gmail.com> * support return logprobs for pipeline (sgl-project#7356) Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com> * [PD] Optimize custom mem pool usage and bump mooncake version (sgl-project#7393) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * Support THUDM/GLM-4-0414 (GLM-Z1) Glm4ForCausalLM architecture. (sgl-project#5485) * Refine OpenAI serving entrypoint to remove batch requests (sgl-project#7372) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Chang Su <csu272@usc.edu> * [Feature] Comprehensive Hybrid Parallelism Support (sgl-project#6389) * [DeepSeekNextN] fix: residual of head norm can be None (sgl-project#7398) * [OAI refactor] Add rerank and score serving (sgl-project#7399) Co-authored-by: Chang Su <chang.s.su@oracle.com> * [OAI Server Refactor] [ChatCompletions & Completions] Implement UsageInfo Processor (sgl-project#7360) Co-authored-by: Chang Su <chang.s.su@oracle.com> * Fix All-Gather under world size one (sgl-project#7219) * Optimize DP attn scheduling for speculative decoding (sgl-project#7285) * Update usage_processor.py (sgl-project#7402) * Fix 7285 Merge Conflicts (sgl-project#7403) * chore: upgrade mooncake-transfer-engine 0.3.4 (sgl-project#7401) * [OAI Server Refactor] [ChatCompletions & Completions] Support Return Hidden State (sgl-project#7329) Signed-off-by: keru <rukeyang@gmail.com> * Remove batches api in docs & example (sgl-project#7400) * [BugFix]: fix EmbeddingReqInput single input error (sgl-project#7396) * [BugFix]fix qwen25 invoke function call streaming responses with curly braces as the starting indicator (sgl-project#7394) * fix overlap pagecount (sgl-project#6984) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> * fix: Fix CI test_function_call_parser.py (sgl-project#7425) * Fix CPU offloading for MLA memory pool (sgl-project#7409) * [fix] PD disaggregation when enable mtp and tp!=dp (sgl-project#7420) * feat(oai refactor): Replace `openai_api` with `entrypoints/openai` (sgl-project#7351) Co-authored-by: Jin Pan <jpan236@wisc.edu> * Refactor LoRAManager and LoRAMemoryPool state management logic for dynamic LoRA loading support (sgl-project#7412) * refactor(test): reorganize OpenAI test file structure (sgl-project#7408) * [minor] simplify the `TokenToKVPoolAllocator` (sgl-project#7414) * Tiny add logging for GC (sgl-project#7406) * FlashInfer NVFP4 MoE with EP & 2-stream shared expert (sgl-project#7327) Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com> Co-authored-by: alcanderian <alcanderian@gmail.com> * Remove copy after bmm (sgl-project#7441) * Fix torch compile run (sgl-project#7391) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: Sai Enduri <saimanas.enduri@amd.com> * [misc] Add PD service discovery support in router (sgl-project#7361) * add fused moe config for qwen3 in triton3.3.1 (sgl-project#7445) * Fix CUDA Graph Check under Deepep with DP FFN (sgl-project#7451) * Update hyperparameter_tuning.md (sgl-project#7454) * feat: integrate deepgemm into EPMoE (sgl-project#6821) Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com> Co-authored-by: TianQiLin666666 <1834987979@qq.com> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> * Solve docker build failed in the virtual machine (sgl-project#7290) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: Sai Enduri <saimanas.enduri@amd.com> Co-authored-by: HAI <hixiao@gmail.com> * Fix a bug in BatchTokenIDOut & Misc style and dependency updates (sgl-project#7457) * [CI] Upgrade mooncake to 0.3.4.post1 to fix 8 gpu tests (sgl-project#7472) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * Fix prefill OOM due to wrong token calculation when page > 1 (sgl-project#7397) * feat(func_call): Add more check in `BaseFormatDetector.parse_streaming_increment` (sgl-project#7479) * Fix dtype for idle input in spec decoding (sgl-project#7456) * update mooncake in dockerfile (sgl-project#7480) * kvcache io kernels and test case (sgl-project#7382) * [perf] slightly imporve DeepSeek-R1-FP4 TP8 (sgl-project#7481) * Quick fix for DeepGemm requant to also cover MTP. (sgl-project#7378) * Support weight loading without mmap (sgl-project#7469) * ci: Revert openai_server related tests in AMD suites (sgl-project#7449) * Perormance: Enable cuda graph for dp idle batch (sgl-project#7269) Co-authored-by: austindeng <austindeng@tencent.com> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: ch-wan <cwan39@gatech.edu> * bugfix: Prevent global mutation of conv.stop_str across requests (sgl-project#7347) Co-authored-by: Chang Su <chang.s.su@oracle.com> * Fix RequestValidationError response format (sgl-project#7487) * Fix MTP with Deepseek R1 Fp4 (sgl-project#7376) * chore: bump sgl-kernel v0.2.0 (sgl-project#7490) * chore: bump v0.4.8 (sgl-project#7493) * [AMD] add aiter fused moe in DeepEP path (sgl-project#7268) * enable aiter_biased_grouped_topk kernel (sgl-project#7423) * [PD Disaggregation] replace transfer with batch transfer for better performance (sgl-project#7236) * Remove cumsum_buffer initilization (sgl-project#7439) * [benchmark] fbgemm benchmark support bandwidth report and support fbgemm_cutlass_gmm (sgl-project#7422) * Support multi-thread model weight loading (sgl-project#7277) * [PD] NIXL: Register kv args in advance and cleanup finished requests (sgl-project#6717) * fix: Add `--model` as an alias for `--model-path` in server_args (sgl-project#7505) * misc: Improvement to serving_chat.py and add more ut (sgl-project#7489) * Fuse sorted_token_ids padding to moe_align_block_size kernel (sgl-project#7437) * [OAI] patch origin request_id logic (sgl-project#7508) * [PD][Spec] Fix hidden state transfer for spec decode (sgl-project#7516) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * EPLB support for MTP (sgl-project#7510) * clean duplicate code (sgl-project#7512) * [ci] add router benchmark script and CI (sgl-project#7498) * fix: force synchronization between TP workers when update_weights (sgl-project#6626) Co-authored-by: dangkai.dk <dangkai.dk@alibaba-inc.com> * [CPU] [BF16] Call fused_experts_cpu, weight_packed_linear and bmm_cpu kernel in DeepSeek model (sgl-project#6641) Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg> * [CI] Upgrade mooncake to v0.3.4.post2 to fix potential slice failed bug (sgl-project#7522) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * npu fused op (sgl-project#7386) Co-authored-by: Li Junwen <lijunwen13@hisilicon.com> * feat: send kvmetrics from sglang scheduler (sgl-project#6721) * [PD] Add different TP sizes support for no-MLA models (sgl-project#6793) Co-authored-by: shangmingc <csmthu@gmail.com> Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com> * enable aiter fp8 blockscale quant (sgl-project#7520) * take aiter get_rope back (sgl-project#7521) * Fix typo of flash_cache (sgl-project#7513) * feat: add return hidden_states at async generation (sgl-project#7507) * minor: 'role' must be system/assistant/tool, but case insensitive for now (sgl-project#7499) * Fix FP8 KV Cache Support in FA3 Backend (sgl-project#7148) * Fix gathered_buffer issues in tbo (sgl-project#7531) * [PD] Raise error for incompatible mooncake version and some minor fixes (sgl-project#7527) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * [CMake] Fix sgl-kernel CMakeLists for Blackwell (sgl-project#7543) * Add Tencent HunYuanMoEV1 model support (sgl-project#7549) * Update seed in CPU UTs to avoid flaky failure with single test (sgl-project#7544) * chore: improve ci bug reporting (sgl-project#7542) * chore: remove vlm unnecessary import (sgl-project#7541) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: yhyang201 <yhyang201@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com> * chore: bump v0.4.8.post1 (sgl-project#7559) * [PD][NIXL] Set is_sorted=False to fix NIXL_ERR_NOT_FOUND (sgl-project#7330) * [Fix] incorrect assert in EPLB (sgl-project#7575) * Updates Gemma3n MLP layer to adapt latest transformers version (sgl-project#7573) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> * Fix MTP error when enabling two-batch overlap (sgl-project#7569) * Add e2e test for multi instance multi stage memory release/resume occupuation (sgl-project#7208) Signed-off-by: Ata Fatahi <immrata@gmail.com> * [CI] Add CI Testing for Prefill-Decode Disaggregation with Router (sgl-project#7540) * Updates transformers and timm dependencies (sgl-project#7577) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> * feat: support compatibility between MTP and two-batch-overlap (sgl-project#7225) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> * Move multimodal processors into a separate folder (sgl-project#7581) * Fix broken CI TestVILAServer (sgl-project#7610) * [router] add centralized configuration module for sgl-router (sgl-project#7588) * Fix: Minicpm (sgl-project#7612) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> * Hybrid kv cache for LLaMA4 (sgl-project#6563) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: tarinkk <rt572@physics.rutger.edu> Co-authored-by: tarinkk <rt572@rutgers.physics.edu> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com> * [CPU] add optimizations for INT8 and FP8 DeepSeek (sgl-project#6769) Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com> * Tiny add logs for expert location updater (sgl-project#7308) * Fix flakiness in LoRA batch test. (sgl-project#7552) * [BUG] fix local_rank in initialize_dp_attention (sgl-project#7584) * Support dynamic LoRA loading / unloading in engine/server API (sgl-project#7446) * [PD] Respect sampling_params.max_new_tokens when PD disaggregation is activated (sgl-project#7598) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * fix unit tests (sgl-project#7618) * Let ep_scatter support arbitrary strides / ue8m0 format (sgl-project#7309) * Let EP prefill support new DeepGEMM (sgl-project#7310) * docs: add gb200 nvl72 and a16z grant (sgl-project#7620) * oai: Adds support for OpenAI chat completions API in bench_serving (sgl-project#7036) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com> * [bugfix] Remove PR comment posting from Rust benchmark workflow (sgl-project#7625) * [Minor] clean up multimodal processor and tokenizer manager (sgl-project#7624) * Add dsv3 fused a gemm to sgl-kernel (sgl-project#7630) * Add @mickqian as the CODEOWNERS of multimodal (sgl-project#7636) * Fix stream reasoning parser and Adds Kimi reasoning parser (sgl-project#7432) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> * Fix sgl-router startup crash (sgl-project#7619) * [bugfix] fix runtime dropping panic in editable (sgl-project#7628) * Move files related to EPLB (sgl-project#7580) * [misc] reduce weird rope_scaling_factor warning (sgl-project#7176) * [AMD] Add unit-test-sgl-kernel-amd to AMD CI (sgl-project#7539) * Update CODEOWNERS (sgl-project#7640) * [EAGLE] remove a wrong adjustment for page_size > 1 & topk > 1 in server_args.py (sgl-project#7643) * [CPU] add c++ kernel to bind CPU cores and memory node (sgl-project#7524) * Improve streaming, log_level, memory report, weight loading, and benchmark script (sgl-project#7632) Co-authored-by: Kan Wu <wukanustc@gmail.com> * Add dsv3 router gemm kernel (sgl-project#7627) * chore: upgrade flashinfer v0.2.7 jit (sgl-project#7663) * [doc] update lws doc for pd (sgl-project#7318) * Fix: sync prepare_fp8_layer_for_marlin with latest vllm changes (sgl-project#7648) * Add small requirements for benchmark/parse_result tools (sgl-project#7671) * [CPU] remove process_group from inputs of shm_allreduce and shm_allgather (sgl-project#7486) * chore: bump sgl-kernel v0.2.1 (sgl-project#7675) * support llama4 eagle3 (sgl-project#6985) Co-authored-by: shuaills <shishuaiuoe@gmail.com> Co-authored-by: Shenggui Li <somerlee.9@gmail.com> Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com> Co-authored-by: yizhang2077 <1109276519@qq.com> * Refactor mm processors and Enable mixed modality processing (sgl-project#7629) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> * upgrade sgl kernel to 0.2.1 for main (sgl-project#7676) * add description for llama4 eagle3 (sgl-project#7688) * fix(model loader): use safe_open to prevent file handle leaks. (sgl-project#7684) * chore: upgrade flashinfer v0.2.7.post1 (sgl-project#7698) * Improve error handling for requests with unloaded LoRA path(s) (sgl-project#7642) * Apply dsv3_fused_a_gemm kernel (sgl-project#7635) * Fix GPTQMarlinMoE (sgl-project#7697) * [1/n] apply wna16marlin kernel in moe weight only quantization (sgl-project#7683) Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: yych0745 <1398089567@qq.com> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: 弋云 <yiyun.wyt@antgroup.com> Co-authored-by: walker-ai <2398833647@qq.com> * Apply dsv3 router gemm kernel for deepseek-r1 fp4 (sgl-project#7677) * [AMD] Temporarily disable test_no_overlap_scheduler and test_vision_chunked_prefill (sgl-project#7717) * [RL] add --skip-warmup (sgl-project#7416) * [RL] support update_weights_from_distributed with different group and multiple weights (sgl-project#7292) * [router] add --log-level to sgl-router (sgl-project#6512) * [b200] support trt-llm allreduce fuse rms_norm_add kernel (sgl-project#7621) * [CPU] Bind threads and numa node for each TP rank (sgl-project#6549) Co-authored-by: srinarayan-srikanthan <srinarayan.srikanthan@intel.com> * Support non-contiguous query input for extend/decode attention (sgl-project#7462) * Support updating weights at once by stopping all requests (sgl-project#6698) Signed-off-by: Tianyu Zhou <albert.zty@antgroup.com> Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com> * Fix num_tokens_pre_allocated in disaggregation log (sgl-project#7714) * [CPU] [sgl-kernel] set dispatch key of initialize to CatchAll (sgl-project#7734) * [CPU] fix all_reduce and all_gather (sgl-project#6770) Co-authored-by: blzheng <beilei.zheng@intel.com> * fix awq and dsv3 fused gemm compatible (sgl-project#7735) * [CI][Router] Fix bench_one_batch_server for pd router test (sgl-project#7731) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * Add CUTLASS FP8 Blockscale MoE kernel for Hopper architecture (sgl-project#7278) Co-authored-by: HydraQYH <QYH820@Outlook.com> Co-authored-by: TianQiLin666666 <1834987979@qq.com> * fix dsv3 fused proj check (sgl-project#7738) * Ascend attention backend(PA&MLA) (sgl-project#7722) Co-authored-by: Maksim <makcum888e@mail.ru> Co-authored-by: VDV1985 <vladdv85@mail.ru> * [fix] fix dsv3_router_gemm filter (sgl-project#7750) * [CPU] refine CPU integration code (sgl-project#7647) * [CPU] support the case where num_attention_heads or intermediate_size is not divisible by the TP size (sgl-project#6771) * support qwen3 dense model dp attention (sgl-project#7681) * [optimize] add two stream norm for qwen3 (sgl-project#7740) Co-authored-by: ispobock <ispobaoke@gmail.com> * feat: use D2D instead of H2H in pp (sgl-project#7673) Co-authored-by: alpha-baby <fujianhao1997@qq.com> * [Bug] add flashinfer bool check for fusedmoe in Qwen moe models (sgl-project#7723) * [fix] put cpu in the first priority in get_device() (sgl-project#7752) * [optimize] fuse renormalize into moe_topk_softmax (sgl-project#7744) Co-authored-by: ispobock <ispobaoke@gmail.com> * chore: bump sgl-kernel 0.2.2 (sgl-project#7755) * fix CI: update native api ipynb (sgl-project#7754) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> * fuse renormal into moe topk softmax kernel python code (sgl-project#7751) Co-authored-by: ispobock <ispobaoke@gmail.com> Co-authored-by: zhyncs <me@zhyncs.com> * Remove type conversion and fix id map in topk (sgl-project#7759) * Add V2-lite model test (sgl-project#7390) Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com> * refactor llama4 dp attention logic (sgl-project#7729) * fix(docs): fix the broken link in `docs/references/production_metrics.md` (sgl-project#7741) Signed-off-by: rudeigerc <rudeigerc@gmail.com> * [fix] update bench_speculative.py for compatibility (sgl-project#7764) Signed-off-by: Kay Yan <kay.yan@daocloud.io> * Move mem_fraction_static adjustment for multimodal models to `server_args.py` & Fix session control & Other cleanups (sgl-project#7748) * [RL] Add --nccl-port to prevent port conflict (sgl-project#7418) * [RL] add pause and continue generation for async rl training (sgl-project#7419) * [Fix] Alloc return type error (sgl-project#7778) Signed-off-by: Capronir <839972205@qq.com> * [feat] Support EAGLE3 for Qwen (sgl-project#7745) Co-authored-by: 纬杭 <ximing.wxm@antgroup.com> Co-authored-by: zyksir <zyksir@outlook.com> * saving hidden_states.clone() (sgl-project#7705) * [1/n]: add cutlass W4A8 moe kernel for hopper architecture (sgl-project#7772) Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com> Co-authored-by: yicwang <yichen.wang@bytedance.com> * add model: qwen2-audio (sgl-project#7596) * Optimize Hopper CUTLASS FP8 Blockwise Grouped GEMM Kernel in Small K Scenario (sgl-project#7782) * Embedding parallel by attn_tp (sgl-project#7623) * fix: fix apply_shuffle_mul_sum (sgl-project#7444) * chore: bump sgl-kernel v0.2.3 (sgl-project#7784) * fix: use nvidia-nccl-cu12 2.27.5 (sgl-project#7787) * DP Attention with Auto DeepEP Dispatch (sgl-project#7222) * chore: upgrade sgl-kernel v0.2.3 (sgl-project#7786) * Fix incorrect spec_num_draft_tokens in draft_extend (sgl-project#7757) * [fix] fix misusing of is_cuda (sgl-project#7790) * Add treemask mode to build_eagle_tree & release sgl-kernel 0.2.3 (sgl-project#7756) Co-authored-by: Pranjal Shankhdhar <pranjal.ssh@gmail.com> * chore: bump sgl-kernel v0.2.4 (sgl-project#7800) * ci: fix port args (sgl-project#7792) * Fix CI test OOM issue. (sgl-project#7799) * chore: upgrade sgl-kernel v0.2.4 (sgl-project#7801) * chore: bump v0.4.9 (sgl-project#7802) * fix merge conflict issue * fix hpu attention nonetyep issue * fix alignment * fix alignment2 * Ci failure fixes * fix attention-backend choices --------- Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Signed-off-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com> Signed-off-by: huanglong <huanglong@linux.alibaba.com> Signed-off-by: Ata Fatahi <immrata@gmail.com> Signed-off-by: keru <rukeyang@gmail.com> Signed-off-by: Tianyu Zhou <albert.zty@antgroup.com> Signed-off-by: rudeigerc <rudeigerc@gmail.com> Signed-off-by: Kay Yan <kay.yan@daocloud.io> Signed-off-by: Capronir <839972205@qq.com> Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com> Signed-off-by: Mohit Sinha <msinha@habana.ai> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: KavioYu <67678385+yukavio@users.noreply.github.com> Co-authored-by: kavioyu <kavioyu@tencent.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com> Co-authored-by: kk <43161300+kkHuang-amd@users.noreply.github.com> Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com> Co-authored-by: u4lr451 <u4lr451@gmail.com> Co-authored-by: austindeng <austindeng@tencent.com> Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com> Co-authored-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: ch-wan <cwan39@gatech.edu> Co-authored-by: Yijie Zhu <762412795@qq.com> Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com> Co-authored-by: Charles Chen <pychen96@gmail.com> Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: shangmingc <caishangming@linux.alibaba.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: YanbingJiang <yanbing.jiang@intel.com> Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com> Co-authored-by: jianan-gu <jianan.gu@intel.com> Co-authored-by: sdp <sdp@gnr799219.jf.intel.com> Co-authored-by: Binyao Jiang <byjiang1996@gmail.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Co-authored-by: linzhuo <15313137931lz@gmail.com> Co-authored-by: ch-tiger1 <tiger@ch-tech.ip-ddns.com> Co-authored-by: ch-tiger1 <xyz@ch-tech.ip-ddns.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com> Co-authored-by: Simo Lin <linsimo.mark@gmail.com> Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com> Co-authored-by: Stefan He <hebiaobuaa@gmail.com> Co-authored-by: DarkSharpness <76582120+DarkSharpness@users.noreply.github.com> Co-authored-by: Atream <80757050+Atream@users.noreply.github.com> Co-authored-by: Li Hui <lambert80.ios@gmail.com> Co-authored-by: Huang Long <121648372+LLLL114@users.noreply.github.com> Co-authored-by: woodx <124784234+woodx9@users.noreply.github.com> Co-authored-by: Ata Fatahi <immrata@gmail.com> Co-authored-by: strgrb <zhangkaihong.zkh@antgroup.com> Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com> Co-authored-by: Wenbo Yang <solrex@users.noreply.github.com> Co-authored-by: Chang Su <csu272@usc.edu> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: Keyang Ru <rukeyang@gmail.com> Co-authored-by: ehuaa <ehuamail@163.com> Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Jin Pan <jpan236@wisc.edu> Co-authored-by: Lifu Huang <lifu.hlf@gmail.com> Co-authored-by: Trevor Morris <tmorris@nvidia.com> Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com> Co-authored-by: alcanderian <alcanderian@gmail.com> Co-authored-by: Ke Bao <ISPObaoke@163.com> Co-authored-by: Sai Enduri <saimanas.enduri@amd.com> Co-authored-by: Yi Zhang <1109276519@qq.com> Co-authored-by: xutizhou <xutingz@nvidia.com> Co-authored-by: TianQiLin666666 <1834987979@qq.com> Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: Yuhong Guo <guoyuhong1985@outlook.com> Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com> Co-authored-by: Alex Sun <alex.s@amd.com> Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com> Co-authored-by: Francis <38564764+ssssnow@users.noreply.github.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: xianzhiT <xianzhitang@tencent.com> Co-authored-by: yilian49 <43861414+yilian49@users.noreply.github.com> Co-authored-by: DangKai <dangkai4u@outlook.com> Co-authored-by: dangkai.dk <dangkai.dk@alibaba-inc.com> Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg> Co-authored-by: ll819214 <18801269230@163.com> Co-authored-by: Li Junwen <lijunwen13@hisilicon.com> Co-authored-by: zixuanzhang226 <zixuanzhang@bytedance.com> Co-authored-by: Hongbo Xu <1320612015@qq.com> Co-authored-by: shangmingc <csmthu@gmail.com> Co-authored-by: eigen <52445717+yyihuang@users.noreply.github.com> Co-authored-by: mlmz <54172054+minleminzui@users.noreply.github.com> Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu> Co-authored-by: Meng, Peng <pengmeng@tencent.com> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: yhyang201 <yhyang201@gmail.com> Co-authored-by: tarinkk <129432511+tarinkk@users.noreply.github.com> Co-authored-by: tarinkk <rt572@physics.rutger.edu> Co-authored-by: tarinkk <rt572@rutgers.physics.edu> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com> Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com> Co-authored-by: Sheng Qi <shengqi2018@pku.edu.cn> Co-authored-by: finetune <82650881+finetunej@users.noreply.github.com> Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com> Co-authored-by: Kan Wu <wukanustc@gmail.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: narutolhy <582909902@qq.com> Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com> Co-authored-by: Shenggui Li <somerlee.9@gmail.com> Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com> Co-authored-by: Simon_CQK <cqk0100@gmail.com> Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com> Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: yych0745 <1398089567@qq.com> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: 弋云 <yiyun.wyt@antgroup.com> Co-authored-by: walker-ai <2398833647@qq.com> Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com> Co-authored-by: srinarayan-srikanthan <srinarayan.srikanthan@intel.com> Co-authored-by: Albert <albert.zty@antgroup.com> Co-authored-by: Ziming Huang <1520787127@qq.com> Co-authored-by: ayrnb <70835312+ayrnb@users.noreply.github.com> Co-authored-by: HydraQYH <QYH820@Outlook.com> Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: Maksim <makcum888e@mail.ru> Co-authored-by: VDV1985 <vladdv85@mail.ru> Co-authored-by: ispobock <ispobaoke@gmail.com> Co-authored-by: TianyuZhang1214 <tianyuzhang1214@163.com> Co-authored-by: alpha-baby <fujianhao1997@qq.com> Co-authored-by: Yuchen Cheng <rudeigerc@gmail.com> Co-authored-by: Kay Yan <kay.yan@daocloud.io> Co-authored-by: Caproni <40862361+Capronir@users.noreply.github.com> Co-authored-by: Ximingwang-09 <72070413+Ximingwang-09@users.noreply.github.com> Co-authored-by: 纬杭 <ximing.wxm@antgroup.com> Co-authored-by: zyksir <zyksir@outlook.com> Co-authored-by: SijiaYang <yangsijia.614@bytedance.com> Co-authored-by: yicwang <yichen.wang@bytedance.com> Co-authored-by: Leng Yue <lengyue@lengyue.me> Co-authored-by: Qi Yuhang <45795032+HydraQYH@users.noreply.github.com> Co-authored-by: Gang Chen <13298548+MoonBall@users.noreply.github.com> Co-authored-by: Pranjal Shankhdhar <pranjal.ssh@gmail.com> Co-authored-by: jay <jthakur@habana.ai>

JustinTong0323 requested review from ByronHsu, Ying1123, hnyls2002, ispobock, merrymercy, xiezhq-hermann, zhaochenyang20 and zhyncs as code owners June 29, 2025 07:13

This comment was marked as outdated.

Sign in to view

JustinTong0323 mentioned this pull request Jun 28, 2025

Gemma3n Usage #7574

Closed

sgl-project locked and limited conversation to collaborators Jun 29, 2025

sgl-project unlocked this conversation Jun 29, 2025

mickqian reviewed Jun 29, 2025

View reviewed changes

JustinTong0323 added 2 commits June 29, 2025 01:37

JustinTong0323 marked this pull request as draft June 29, 2025 20:05

JustinTong0323 changed the title ~~Refactor base_processor: Enables mixed modality processing~~ Refactor mm processors: Enable mixed modality processing Jun 29, 2025

JustinTong0323 added 3 commits June 29, 2025 13:59

fix lint

dedeca1

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

JustinTong0323 changed the title ~~Refactor mm processors: Enable mixed modality processing~~ [WIP] Refactor mm processors: Enable mixed modality processing Jun 29, 2025

JustinTong0323 marked this pull request as ready for review June 29, 2025 21:08

JustinTong0323 and others added 4 commits June 29, 2025 14:09

Merge branch 'main' into refactor-base-processor

6cd3559

fix: import Gemma3nSGLangProcessor

53cbe75

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

fix last commit

10f0ff7

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

This comment was marked as outdated.

Sign in to view

JustinTong0323 requested review from BBuf, FlamingoPg, HaiShaw, HandH1998, ch-wan, slin1237 and yizhang2077 as code owners June 30, 2025 08:20

JustinTong0323 force-pushed the refactor-base-processor branch from e7c60e0 to 352d978 Compare June 30, 2025 08:21

JustinTong0323 and others added 2 commits June 30, 2025 01:22

Merge branch 'main' into refactor-base-processor

08026d3

JustinTong0323 changed the title ~~[WIP] Refactor mm processors: Enable mixed modality processing~~ [WIP] Refactor mm processors and Enable mixed modality processing Jun 30, 2025

JustinTong0323 requested a review from mickqian July 1, 2025 06:10

JustinTong0323 changed the title ~~[WIP] Refactor mm processors and Enable mixed modality processing~~ Refactor mm processors and Enable mixed modality processing Jul 1, 2025

zhyncs merged commit 3a911b8 into sgl-project:main Jul 1, 2025
174 of 199 checks passed

chenxijun1029 pushed a commit to chenxijun1029/sglang that referenced this pull request Jul 17, 2025

Refactor mm processors and Enable mixed modality processing (sgl-proj…

6a65ee8

…ect#7629) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

JustinTong0323 deleted the refactor-base-processor branch July 18, 2025 23:22

Refactor mm processors and Enable mixed modality processing #7629

Refactor mm processors and Enable mixed modality processing #7629

Uh oh!

Conversation

JustinTong0323 commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

mickqian Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mickqian Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

JustinTong0323 Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mickqian Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

JustinTong0323 Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mickqian Jun 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JustinTong0323 commented Jun 29, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

JustinTong0323 commented Jun 30, 2025

Uh oh!

gemini-code-assist bot commented Jun 30, 2025

Summary of Changes

Highlights

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JustinTong0323 commented Jun 29, 2025 •

edited

Loading

mickqian Jun 29, 2025 •

edited

Loading

JustinTong0323 Jun 29, 2025 •

edited

Loading