Skip to content

[AMD] Add MTP support for DeepSeek R1 FP8 MI355X SGLang#628

Draft
benenzhu wants to merge 6 commits intoInferenceMAX:mainfrom
benenzhu:main
Draft

[AMD] Add MTP support for DeepSeek R1 FP8 MI355X SGLang#628
benenzhu wants to merge 6 commits intoInferenceMAX:mainfrom
benenzhu:main

Conversation

@benenzhu
Copy link

@benenzhu benenzhu commented Feb 4, 2026

Summary

This PR adds MTP (Multi-Token Prediction) support for DeepSeek R1 FP8 on MI355X using SGLang with EAGLE speculative decoding.

SGLang's EAGLE implementation uses DeepSeek R1's native MTP weights to speculatively decode 3 additional tokens per forward pass. This PR uses the default EAGLE configurations.

Changes

  • Add new benchmark script benchmarks/dsr1_fp8_mi355x_mtp.sh with EAGLE speculative decoding configuration
  • Add dsr1-fp8-mi355x-sglang-mtp config entry to .github/configs/amd-master.yaml
  • Update runners/launch_mi355x-amds.sh to support SPEC_SUFFIX for MTP script selection
  • Add perf-changelog entry documenting the changes

Configuration

  • Concurrency sweep: 4 to 128
  • Sequence lengths: 1k1k, 1k8k, 8k1k
  • TP: 8

New SGLang Arguments

Argument Value Description
--speculative-algorithm EAGLE Enable MTP speculative decoding
--max-running-requests CONC * 4 Increased running requests for MTP workload
--cuda-graph-max-bs CONC * 4 Increased CUDA graph batch size for MTP

- Add benchmark script benchmarks/dsr1_fp8_mi355x_mtp.sh with EAGLE speculative decoding
- Add dsr1-fp8-mi355x-sglang-mtp config entry to .github/configs/amd-master.yaml
- Update runners/launch_mi355x-amds.sh to support SPEC_SUFFIX for MTP script selection
- Add perf-changelog entry documenting the changes

Configurations: TP=8, concurrency 4-128 for 1k1k, 1k8k, and 8k1k sequence lengths
Image: lmsysorg/sglang:v0.5.8-rocm700-mi35x
Co-authored-by: Todd zhenchen@amd.com
@cquil11
Copy link
Collaborator

cquil11 commented Feb 4, 2026

/sweep test-config dsr1-fp8-mi355x-sglang-mtp --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

@cquil11 Kicking off a sweep.

Run: https://github.com/InferenceMAX/InferenceMAX/actions/runs/21689411588
Command: test-config dsr1-fp8-mi355x-sglang-mtp --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml
Pinned ref: 2065877
Approval: not required (trusted collaborator).

@cquil11
Copy link
Collaborator

cquil11 commented Feb 4, 2026

/sweep test-config --config-keys dsr1-fp8-mi355x-sglang-mtp --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

@cquil11 Kicking off a sweep.

Run: https://github.com/InferenceMAX/InferenceMAX/actions/runs/21689430926
Command: test-config --config-keys dsr1-fp8-mi355x-sglang-mtp --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml
Pinned ref: 2065877
Approval: not required (trusted collaborator).

@benenzhu
Copy link
Author

benenzhu commented Feb 5, 2026

/sweep test-config --config-keys dsr1-fp8-mi355x-sglang-mtp --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml

Oh, failed with some wired error at graph capture stage. I will convert it to draft and try fix locally then. Also need some tune for the use-chat-template config.
Thanks for your review.

@benenzhu benenzhu marked this pull request as draft February 5, 2026 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants