Support Load Format `runai_streamer` #4966

HermitSun · 2025-04-01T06:30:29Z

Motivation

Resolve #4822.

Modifications

Support loading safetensors weights with runai_streamer. This can be enabled by adding the option --load-format runai_streamer when launching.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

stevapple · 2025-04-01T13:35:29Z

python/sglang/srt/server_args.py

                "bitsandbytes",
                "layered",
                "remote",
+                "runai_streamer",


You may want to add a description for it below.

Thanks for the reminder, I've already added it.
As for why I didn't add a comment for remote, it's because I think the logic of runai_streamer and remote can be merged. I'll try to refactor this logic a bit later.

brayden-hai · 2025-09-05T00:37:46Z

Hi @HermitSun I'm wondering if the existing SGLang already supports runai streamer, as I was able to install it but the performance was still not as good as expected. I'm interested in the S3 use case. Right now I am using the basic MP model loader, I wonder if you have compared this performance with the MP loader in #7277

ajmyyra · 2025-10-22T21:26:12Z

Hi @HermitSun, would you be willing to rebase this PR so it could be considered? I was doing a similar implementation for a PR when I found yours and as you were the first, it would be good to have your changes considered (& hopefully merged). If you're short on time, I can help to test it out.

RunAI's Model Streamer performs up to 2x better when loading safetensor weights from a filesystem over a network (such as NFS). It has also been implemented with similar naming in vLLM, and as the model loader design seems to follow that quite closely, it would be good to support this in SGLang as well.

feat: support load format runai streamer

36d531c

HermitSun requested review from ByronHsu, Ying1123, hnyls2002, ispobock, merrymercy and zhyncs as code owners April 1, 2025 06:30

HermitSun and others added 2 commits April 1, 2025 14:33

fix: lint

036535e

chore: support for all devices

a4d5e93

stevapple reviewed Apr 1, 2025

View reviewed changes

HermitSun and others added 2 commits April 2, 2025 14:47

docs: add comments

2d78e38

Merge branch 'main' into runai-streamer

ea5340e

b8zhong mentioned this pull request Nov 3, 2025

[Feature] Multiple model weight loading improvements #12529

Open

2 tasks

merrymercy requested a review from Fridge003 as a code owner November 29, 2025 07:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Load Format `runai_streamer` #4966

Support Load Format `runai_streamer` #4966

Uh oh!

HermitSun commented Apr 1, 2025 •

edited

Loading

Uh oh!

stevapple Apr 1, 2025

Uh oh!

HermitSun Apr 2, 2025

Uh oh!

brayden-hai commented Sep 5, 2025

Uh oh!

ajmyyra commented Oct 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Support Load Format runai_streamer #4966

Are you sure you want to change the base?

Support Load Format runai_streamer #4966

Uh oh!

Conversation

HermitSun commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

stevapple Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

HermitSun Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

brayden-hai commented Sep 5, 2025

Uh oh!

ajmyyra commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Support Load Format `runai_streamer` #4966

Support Load Format `runai_streamer` #4966

HermitSun commented Apr 1, 2025 •

edited

Loading

ajmyyra commented Oct 22, 2025 •

edited

Loading