[WIP] [NV] update DSR1 SGLang MTP configs on single node B200 by zbpatel · Pull Request #631 · InferenceMAX/InferenceMAX

zbpatel · 2026-02-04T22:47:50Z

This PR updates the Nvidia B200 configs for sglang DSR1 agg configs and is a follow up to #626 . Currently I branched from yunzhoul/update-sglang-mtp-configs because there are some changes needed for this, I will clean up the git branching once that is merged in.

Put each argument on its own line for easier reading and maintenance. Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

functionstackx · 2026-02-05T00:00:54Z

@claude review plz

Klaud-Cold · 2026-02-05T00:01:15Z

Claude finished @functionstackx's task in 1m 21s —— View job

PR Review Complete

Gather context and read changed files
Review .github/configs/nvidia-master.yaml changes
Review benchmarks/dsr1_fp8_b200_mtp.sh (new file)
Review perf-changelog.yaml changes
Review runners/launch_b200-dgxc.sh changes
Provide comprehensive review feedback

Summary

This PR adds MTP (Multi-Token Prediction) support for DeepSeek R1 FP8 on B200 using SGLang's EAGLE speculative decoding. It includes a new benchmark script, config updates, and launcher modifications.

Changes Overview

File	Change
`nvidia-master.yaml`	Bump sglang image to v0.5.8-cu130, add new `dsr1-fp8-b200-sglang-mtp` config
`dsr1_fp8_b200_mtp.sh`	New benchmark script with EAGLE speculative decoding
`perf-changelog.yaml`	Document the new MTP config
`launch_b200-dgxc.sh`	Add SPEC_SUFFIX for MTP script selection

Issues Found

🔴 High Priority

1. Incorrect PR link in perf-changelog.yaml

The pr-link references PR #626 but this is PR #631:

pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/626  # Should be #631

Fix this →

2. Inconsistency: --use-chat-template added to MTP script but not in non-MTP version

dsr1_fp8_b200_mtp.sh:93 adds --use-chat-template flag, but the non-MTP dsr1_fp8_b200.sh doesn't use it. This could lead to inconsistent benchmark results between MTP and non-MTP runs. Is this intentional?

🟡 Medium Priority

3. Higher MAX_RUNNING_REQUESTS / CUDA_GRAPH_MAX_BATCH_SIZE values

MTP script uses 512 vs 128 in non-MTP:

dsr1_fp8_b200_mtp.sh:44-45: MAX_RUNNING_REQUESTS=512, CUDA_GRAPH_MAX_BATCH_SIZE=512
dsr1_fp8_b200.sh:38-39: MAX_RUNNING_REQUESTS=128, CUDA_GRAPH_MAX_BATCH_SIZE=128

This is consistent with the higher conc-end: 512 in the config (vs conc-end: 64 non-MTP), but the comment says it should match maximum concurrency. Worth confirming these memory requirements have been validated.

4. Changelog description says "concurrency 4-64" but config has "conc-end: 512"

perf-changelog.yaml states:

"Configurations: TP=8, EP=1, concurrency 4-64 for 1k1k, 1k8k, and 8k1k sequence lengths"

But the actual config shows conc-start: 4, conc-end: 512:

- { tp: 8, ep: 1, conc-start: 4, conc-end: 512, spec-decoding: mtp }

Fix this →

🟢 Low Priority / Suggestions

5. MTP script only supports TP=8, but this is consistent with the config

The script has a guard at line 31-34:

if [[ $TP -ne 8 ]]; then
  echo "MTP only supports TP=8, got TP=$TP!"
  exit 1
fi

This is good and matches the config which only has TP=8 configurations. Just noting for awareness.

6. Non-MTP FP4 config also updated to v0.5.8

The dsr1-fp4-b200-sglang config was also updated from v0.5.6 to v0.5.8. This is expected for consistency, just noting it's included.

Code Quality

✅ Good practices observed:

Clear variable naming and documentation in the benchmark script
Appropriate use of SPEC_SUFFIX pattern in launcher (consistent with existing FRAMEWORK_SUFFIX)
Environment variables properly checked at script start
Server readiness wait before benchmarking
Cleanup with graceful docker stop

✅ EAGLE speculative decoding config looks reasonable:

--speculative-num-steps 2
--speculative-num-draft-tokens 3
--speculative-eagle-topk 1

Questions for the Author

Is the --use-chat-template flag intentionally added only to the MTP version? If so, should it be backported to the non-MTP script for consistency?
Have the memory requirements been validated for 512 concurrent requests with EAGLE speculative decoding?
The PR description mentions this is branched from yunzhoul/update-sglang-mtp-configs - is that PR ([NV] add DSR1 SGLang MTP configs on single node B200 #626) now merged, or does this PR include changes from both?

Recommendation

Request changes - Please fix the pr-link and concurrency description in perf-changelog.yaml before merging.
• Branch: zpatel/sglang_b200

Elnifio and others added 10 commits February 3, 2026 15:22

use the script way

e4073fb

updates certain knobs

408a174

update the runner spec suffix

f1c3ffc

prepare for sweeps

69532a5

bump concurrency

574bcd3

Format sglang.launch_server args for readability

719be18

Put each argument on its own line for easier reading and maintenance. Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

Merge branch 'main' into yunzhoul/update-sglang-mtp-configs

33c4c67

remove unnecessary ps aux

74195bd

bumps concurrency

6ed6326

bump sglang b200 agg configs to 0.5.8

61e49a4

zbpatel requested review from csahithi, kedarpotdar-nv and yunzhoul-nv February 4, 2026 22:47

zbpatel self-assigned this Feb 4, 2026

zbpatel requested a review from a team as a code owner February 4, 2026 22:47

zbpatel added the NVIDIA label Feb 4, 2026

github-project-automation bot added this to InferenceMAX Board Feb 4, 2026

functionstackx added the sweep-enabled label Feb 5, 2026

functionstackx changed the title ~~[WIP] [NV] Update B200 DSR1 Sglang Agg Configs~~ [WIP] [NV] update DSR1 SGLang MTP configs on single node B200 Feb 5, 2026

yunzhoul-nv added sweep-enabled and removed sweep-enabled labels Feb 5, 2026

zbpatel added 3 commits February 5, 2026 14:39

update sgl b200 non-mtp config

5579bb2

grab updated fp8 mtp script

3cd3602

add v1 fp4 mtp config script

d858ed5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [NV] update DSR1 SGLang MTP configs on single node B200#631

[WIP] [NV] update DSR1 SGLang MTP configs on single node B200#631
zbpatel wants to merge 13 commits intomainfrom
zpatel/sglang_b200

zbpatel commented Feb 4, 2026 •

edited

Loading

Uh oh!

functionstackx commented Feb 5, 2026

Uh oh!

Klaud-Cold commented Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

zbpatel commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

functionstackx commented Feb 5, 2026

Uh oh!

Klaud-Cold commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete

Summary

Changes Overview

Issues Found

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority / Suggestions

Code Quality

Questions for the Author

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zbpatel commented Feb 4, 2026 •

edited

Loading

Klaud-Cold commented Feb 5, 2026 •

edited

Loading