Skip to content

[DRAFT]: Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs (STP only)#635

Draft
jgangani wants to merge 9 commits intomainfrom
jgangani_dsr1_fp8_disagg_sgl_update
Draft

[DRAFT]: Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs (STP only)#635
jgangani wants to merge 9 commits intomainfrom
jgangani_dsr1_fp8_disagg_sgl_update

Conversation

@jgangani
Copy link
Collaborator

@jgangani jgangani commented Feb 5, 2026

No description provided.

@jgangani jgangani requested a review from jthomson04 February 5, 2026 00:39
@functionstackx functionstackx changed the title Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs Update GB200 and GB300 SGLANG FP8 STP DSR1 Disaggregated inference configs Feb 5, 2026
@functionstackx functionstackx changed the title Update GB200 and GB300 SGLANG FP8 STP DSR1 Disaggregated inference configs Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs (STP only) Feb 5, 2026
@jgangani
Copy link
Collaborator Author

jgangani commented Feb 5, 2026

@claude Review.

@Klaud-Cold
Copy link
Collaborator

Klaud-Cold commented Feb 5, 2026

Claude finished @jgangani's task in 1m 17s —— View job


PR Review Complete

  • Gather context and read changed files
  • Review .github/configs/nvidia-master.yaml changes
  • Review perf-changelog.yaml changes
  • Review runners/launch_gb200-nv.sh changes
  • Check for any issues from previous review comments
  • Provide comprehensive review feedback

Summary

This PR updates GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs for STP mode, including new image version, updated parallelism settings, and CONFIG_FILE-based configuration.


🔴 Critical Issues

1. Missing elif in launch_gb200-nv.sh (Syntax Error)

File: runners/launch_gb200-nv.sh:12-20

The shell script has a missing elif keyword, which will cause a syntax error at runtime:

if [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp4" ]]; then
    export MODEL_PATH="/mnt/lustre01/models/deepseek-r1-0528-fp4-v2/"
    export SRT_SLURM_MODEL_PREFIX="dsfp4"
if [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp8" ]]; then  # ❌ Missing 'elif'
    export MODEL_PATH="/mnt/numa1/groups/sa-shared/models/deepseek-r1-0528/"
    export SRT_SLURM_MODEL_PREFIX="dsr1-fp8"
else
    export MODEL_PATH=$MODEL
fi

Fix: Change the second if to elif:

if [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp4" ]]; then
    export MODEL_PATH="/mnt/lustre01/models/deepseek-r1-0528-fp4-v2/"
    export SRT_SLURM_MODEL_PREFIX="dsfp4"
elif [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp8" ]]; then
    export MODEL_PATH="/mnt/numa1/groups/sa-shared/models/deepseek-r1-0528/"
    export SRT_SLURM_MODEL_PREFIX="dsr1-fp8"
else
    export MODEL_PATH=$MODEL
fi

Fix this →

2. Missing fi Statement in launch_gb200-nv.sh

File: runners/launch_gb200-nv.sh:12-20

The first if block is never closed with fi. This will cause the script to fail.

Current code:

if [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp4" ]]; then
    ...
if [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp8" ]]; then  # This needs to be elif
    ...
else
    ...
fi  # Only one fi, but two 'if' statements

Fix: Use elif instead of a nested if (as mentioned above), which will properly close the structure with a single fi.


⚠️ Warnings

1. Inconsistent SRT_SLURM_MODEL_PREFIX for dynamo-trt

File: runners/launch_gb200-nv.sh:21-34

The dynamo-trt block still uses MODEL_PREFIX == "dsr1-fp8" (line 28), which is inconsistent with the fix applied to dynamo-sglang. The review feedback from PR #627 suggested using MODEL_PREFIX == "dsr1" && PRECISION == "fp8" pattern to avoid breaking the frontend.

elif [[ $MODEL_PREFIX == "dsr1-fp8" ]]; then  # ⚠️ Still uses dsr1-fp8
    export MODEL_PATH="/mnt/numa1/groups/sa-shared/models/deepseek-r1-0528/"
    export SERVED_MODEL_NAME="deepseek-r1-fp8"

Recommendation: Apply the same pattern to dynamo-trt for consistency:

elif [[ $MODEL_PREFIX == "dsr1" && $PRECISION == "fp8" ]]; then

Note: This may be intentional if dynamo-trt FP8 configs haven't been updated yet, but it will cause issues when they are.


✅ Good Changes

1. nvidia-master.yaml Updates

  • Image updated to lmsysorg/sglang:v0.5.8-cu130 for both GB200 and GB300 configs
  • Model prefix correctly set to dsr1 (not dsr1-fp8) per the review feedback
  • New GB300 config added (dsr1-fp8-gb300-dynamo-sglang)
  • CONFIG_FILE-based configuration properly references srt-slurm recipes
  • Proper parallelism settings (TP/EP values match the comments)
  • dp-attn settings are consistent with low-latency vs high-throughput scenarios

2. perf-changelog.yaml Updates

  • ✅ Properly documents the config changes
  • ✅ Lists both affected config keys
  • ✅ Includes image version and description

3. Model prefix fix for dsr1-fp8-gb200-dynamo-trt

  • ✅ Line 2289: model-prefix: dsr1 (changed from dsr1-fp8) - addresses the frontend compatibility issue

Minor Observations

  1. YAML indentation: Some comments in the search-space have inconsistent indentation (extra leading space before #). This doesn't affect functionality but could be cleaned up for consistency.

  2. Removed code: The old dynamo-sglang-specific job handling code was removed from launch_gb200-nv.sh (lines 55-97 in the old file). This seems intentional as srt-slurm now handles job submission.


Verdict

🔴 Changes Requested - The shell script has a critical syntax error that will prevent execution. Please fix the missing elif before merging.


· Branch: jgangani_dsr1_fp8_disagg_sgl_update

@jgangani jgangani force-pushed the jgangani_dsr1_fp8_disagg_sgl_update branch from 28e4ba5 to f3ab616 Compare February 5, 2026 06:14
@jgangani jgangani changed the title Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs (STP only) [DRAFT]: Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs (STP only) Feb 5, 2026
@jgangani jgangani marked this pull request as draft February 5, 2026 18:10
@functionstackx
Copy link
Contributor

hi @jgangani

thanks for the PR. i will removing sweep-enable and cancelling this sweep since it was already failing

feel free to re-add when the patch been fixed

https://github.com/InferenceMAX/InferenceMAX/actions/runs/21701039822

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants