[WIP] [NV] update DSR1 SGLang MTP configs on single node B200#631
[WIP] [NV] update DSR1 SGLang MTP configs on single node B200#631
Conversation
Put each argument on its own line for easier reading and maintenance. Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
|
@claude review plz |
|
Claude finished @functionstackx's task in 1m 21s —— View job PR Review Complete
SummaryThis PR adds MTP (Multi-Token Prediction) support for DeepSeek R1 FP8 on B200 using SGLang's EAGLE speculative decoding. It includes a new benchmark script, config updates, and launcher modifications. Changes Overview
Issues Found🔴 High Priority1. Incorrect PR link in The pr-link references PR #626 but this is PR #631: pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/626 # Should be #6312. Inconsistency:
🟡 Medium Priority3. Higher MTP script uses 512 vs 128 in non-MTP:
This is consistent with the higher 4. Changelog description says "concurrency 4-64" but config has "conc-end: 512"
But the actual config shows - { tp: 8, ep: 1, conc-start: 4, conc-end: 512, spec-decoding: mtp }🟢 Low Priority / Suggestions5. MTP script only supports TP=8, but this is consistent with the config The script has a guard at line 31-34: if [[ $TP -ne 8 ]]; then
echo "MTP only supports TP=8, got TP=$TP!"
exit 1
fiThis is good and matches the config which only has TP=8 configurations. Just noting for awareness. 6. Non-MTP FP4 config also updated to v0.5.8 The Code Quality✅ Good practices observed:
✅ EAGLE speculative decoding config looks reasonable:
Questions for the Author
RecommendationRequest changes - Please fix the pr-link and concurrency description in |
This PR updates the Nvidia B200 configs for sglang DSR1 agg configs and is a follow up to #626 . Currently I branched from
yunzhoul/update-sglang-mtp-configsbecause there are some changes needed for this, I will clean up the git branching once that is merged in.