[DRAFT]: Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs (STP only) by jgangani · Pull Request #635 · InferenceMAX/InferenceMAX

jgangani · 2026-02-05T00:34:21Z

No description provided.

.github/configs/nvidia-master.yaml

runners/launch_gb200-nv.sh

jgangani · 2026-02-05T06:07:21Z

@claude Review.

Klaud-Cold · 2026-02-05T06:07:41Z

Claude finished @jgangani's task in 1m 17s —— View job

PR Review Complete

Gather context and read changed files
Review .github/configs/nvidia-master.yaml changes
Review perf-changelog.yaml changes
Review runners/launch_gb200-nv.sh changes
Check for any issues from previous review comments
Provide comprehensive review feedback

Summary

This PR updates GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs for STP mode, including new image version, updated parallelism settings, and CONFIG_FILE-based configuration.

🔴 Critical Issues

1. Missing `elif` in `launch_gb200-nv.sh` (Syntax Error)

File: runners/launch_gb200-nv.sh:12-20

The shell script has a missing elif keyword, which will cause a syntax error at runtime:

if [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp4" ]]; then
    export MODEL_PATH="/mnt/lustre01/models/deepseek-r1-0528-fp4-v2/"
    export SRT_SLURM_MODEL_PREFIX="dsfp4"
if [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp8" ]]; then  # ❌ Missing 'elif'
    export MODEL_PATH="/mnt/numa1/groups/sa-shared/models/deepseek-r1-0528/"
    export SRT_SLURM_MODEL_PREFIX="dsr1-fp8"
else
    export MODEL_PATH=$MODEL
fi

Fix: Change the second if to elif:

if [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp4" ]]; then
    export MODEL_PATH="/mnt/lustre01/models/deepseek-r1-0528-fp4-v2/"
    export SRT_SLURM_MODEL_PREFIX="dsfp4"
elif [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp8" ]]; then
    export MODEL_PATH="/mnt/numa1/groups/sa-shared/models/deepseek-r1-0528/"
    export SRT_SLURM_MODEL_PREFIX="dsr1-fp8"
else
    export MODEL_PATH=$MODEL
fi

Fix this →

2. Missing `fi` Statement in `launch_gb200-nv.sh`

File: runners/launch_gb200-nv.sh:12-20

The first if block is never closed with fi. This will cause the script to fail.

Current code:

if [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp4" ]]; then
    ...
if [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp8" ]]; then  # This needs to be elif
    ...
else
    ...
fi  # Only one fi, but two 'if' statements

Fix: Use elif instead of a nested if (as mentioned above), which will properly close the structure with a single fi.

⚠️ Warnings

1. Inconsistent `SRT_SLURM_MODEL_PREFIX` for dynamo-trt

File: runners/launch_gb200-nv.sh:21-34

The dynamo-trt block still uses MODEL_PREFIX == "dsr1-fp8" (line 28), which is inconsistent with the fix applied to dynamo-sglang. The review feedback from PR #627 suggested using MODEL_PREFIX == "dsr1" && PRECISION == "fp8" pattern to avoid breaking the frontend.

elif [[ $MODEL_PREFIX == "dsr1-fp8" ]]; then  # ⚠️ Still uses dsr1-fp8
    export MODEL_PATH="/mnt/numa1/groups/sa-shared/models/deepseek-r1-0528/"
    export SERVED_MODEL_NAME="deepseek-r1-fp8"

Recommendation: Apply the same pattern to dynamo-trt for consistency:

elif [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp8" ]]; then

Note: This may be intentional if dynamo-trt FP8 configs haven't been updated yet, but it will cause issues when they are.

✅ Good Changes

1. `nvidia-master.yaml` Updates

✅ Image updated to lmsysorg/sglang:v0.5.8-cu130 for both GB200 and GB300 configs
✅ Model prefix correctly set to dsr1 (not dsr1-fp8) per the review feedback
✅ New GB300 config added (dsr1-fp8-gb300-dynamo-sglang)
✅ CONFIG_FILE-based configuration properly references srt-slurm recipes
✅ Proper parallelism settings (TP/EP values match the comments)
✅ dp-attn settings are consistent with low-latency vs high-throughput scenarios

2. `perf-changelog.yaml` Updates

✅ Properly documents the config changes
✅ Lists both affected config keys
✅ Includes image version and description

3. Model prefix fix for `dsr1-fp8-gb200-dynamo-trt`

✅ Line 2289: model-prefix: dsr1 (changed from dsr1-fp8) - addresses the frontend compatibility issue

Minor Observations

YAML indentation: Some comments in the search-space have inconsistent indentation (extra leading space before #). This doesn't affect functionality but could be cleaned up for consistency.
Removed code: The old dynamo-sglang-specific job handling code was removed from launch_gb200-nv.sh (lines 55-97 in the old file). This seems intentional as srt-slurm now handles job submission.

Verdict

🔴 Changes Requested - The shell script has a critical syntax error that will prevent execution. Please fix the missing elif before merging.

· Branch: jgangani_dsr1_fp8_disagg_sgl_update

functionstackx · 2026-02-05T20:15:09Z

hi @jgangani

thanks for the PR. i will removing sweep-enable and cancelling this sweep since it was already failing

feel free to re-add when the patch been fixed

https://github.com/InferenceMAX/InferenceMAX/actions/runs/21701039822

Update GB200 and GB300 SGLANG FP8 DSR1 configs

98982c6

jgangani requested a review from a team as a code owner February 5, 2026 00:34

github-project-automation bot added this to InferenceMAX Board Feb 5, 2026

jgangani added the sweep-enabled label Feb 5, 2026

update PR #

2259b91

jgangani requested a review from jthomson04 February 5, 2026 00:39

update typo

6da9803

functionstackx changed the title ~~Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs~~ Update GB200 and GB300 SGLANG FP8 STP DSR1 Disaggregated inference configs Feb 5, 2026

functionstackx reviewed Feb 5, 2026

View reviewed changes

.github/configs/nvidia-master.yaml Outdated Show resolved Hide resolved

runners/launch_gb200-nv.sh Outdated Show resolved Hide resolved

kedarpotdar-nv added the NVIDIA label Feb 5, 2026

functionstackx changed the title ~~Update GB200 and GB300 SGLANG FP8 STP DSR1 Disaggregated inference configs~~ Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs (STP only) Feb 5, 2026

Jatin Gangani added 2 commits February 4, 2026 21:49

Update model prefix to work with frontend

05a6ba1

update model prefix

2eeb10b

jgangani added sweep-enabled and removed sweep-enabled labels Feb 5, 2026

fix if else block

f3ab616

jgangani force-pushed the jgangani_dsr1_fp8_disagg_sgl_update branch from 28e4ba5 to f3ab616 Compare February 5, 2026 06:14

Merge branch 'main' into jgangani_dsr1_fp8_disagg_sgl_update

1e50cc7

jgangani changed the title ~~Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs (STP only)~~ [DRAFT]: Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs (STP only) Feb 5, 2026

jgangani marked this pull request as draft February 5, 2026 18:10

functionstackx removed the sweep-enabled label Feb 5, 2026

Jatin Gangani added 2 commits February 5, 2026 14:22

-Add ultra tpt point for gb200 fp8 1k1k

30750c9

Correct typo

e6f9fc3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT]: Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs (STP only)#635

[DRAFT]: Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs (STP only)#635
jgangani wants to merge 9 commits intomainfrom
jgangani_dsr1_fp8_disagg_sgl_update

jgangani commented Feb 5, 2026

Uh oh!

Uh oh!

Uh oh!

jgangani commented Feb 5, 2026

Uh oh!

Klaud-Cold commented Feb 5, 2026 •

edited

Loading

Uh oh!

functionstackx commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jgangani commented Feb 5, 2026

Uh oh!

Uh oh!

Uh oh!

jgangani commented Feb 5, 2026

Uh oh!

Klaud-Cold commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete

Summary

🔴 Critical Issues

1. Missing elif in launch_gb200-nv.sh (Syntax Error)

2. Missing fi Statement in launch_gb200-nv.sh

⚠️ Warnings

1. Inconsistent SRT_SLURM_MODEL_PREFIX for dynamo-trt

✅ Good Changes

1. nvidia-master.yaml Updates

2. perf-changelog.yaml Updates

3. Model prefix fix for dsr1-fp8-gb200-dynamo-trt

Minor Observations

Verdict

Uh oh!

functionstackx commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Klaud-Cold commented Feb 5, 2026 •

edited

Loading

1. Missing `elif` in `launch_gb200-nv.sh` (Syntax Error)

2. Missing `fi` Statement in `launch_gb200-nv.sh`

1. Inconsistent `SRT_SLURM_MODEL_PREFIX` for dynamo-trt

1. `nvidia-master.yaml` Updates

2. `perf-changelog.yaml` Updates

3. Model prefix fix for `dsr1-fp8-gb200-dynamo-trt`