[AMD] Add MTP support for DeepSeek R1 FP8 MI355X SGLang by benenzhu · Pull Request #628 · InferenceMAX/InferenceMAX

benenzhu · 2026-02-04T10:20:08Z

Summary

This PR adds MTP (Multi-Token Prediction) support for DeepSeek R1 FP8 on MI355X using SGLang with EAGLE speculative decoding.

SGLang's EAGLE implementation uses DeepSeek R1's native MTP weights to speculatively decode 3 additional tokens per forward pass. This PR uses the default EAGLE configurations.

Changes

Add new benchmark script benchmarks/dsr1_fp8_mi355x_mtp.sh with EAGLE speculative decoding configuration
Add dsr1-fp8-mi355x-sglang-mtp config entry to .github/configs/amd-master.yaml
Update runners/launch_mi355x-amds.sh to support SPEC_SUFFIX for MTP script selection
Add perf-changelog entry documenting the changes

Configuration

Concurrency sweep: 4 to 128
Sequence lengths: 1k1k, 1k8k, 8k1k
TP: 8

New SGLang Arguments

Argument	Value	Description
`--speculative-algorithm`	EAGLE	Enable MTP speculative decoding
`--max-running-requests`	CONC * 4	Increased running requests for MTP workload
`--cuda-graph-max-bs`	CONC * 4	Increased CUDA graph batch size for MTP

- Add benchmark script benchmarks/dsr1_fp8_mi355x_mtp.sh with EAGLE speculative decoding - Add dsr1-fp8-mi355x-sglang-mtp config entry to .github/configs/amd-master.yaml - Update runners/launch_mi355x-amds.sh to support SPEC_SUFFIX for MTP script selection - Add perf-changelog entry documenting the changes Configurations: TP=8, concurrency 4-128 for 1k1k, 1k8k, and 8k1k sequence lengths Image: lmsysorg/sglang:v0.5.8-rocm700-mi35x Co-authored-by: Todd zhenchen@amd.com

benchmarks/dsr1_fp8_mi355x_mtp.sh

cquil11 · 2026-02-04T21:39:37Z

/sweep test-config dsr1-fp8-mi355x-sglang-mtp --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml

github-actions · 2026-02-04T21:39:47Z

@cquil11 Kicking off a sweep.

Run: https://github.com/InferenceMAX/InferenceMAX/actions/runs/21689411588
Command: test-config dsr1-fp8-mi355x-sglang-mtp --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml
Pinned ref: 2065877
Approval: not required (trusted collaborator).

cquil11 · 2026-02-04T21:40:17Z

/sweep test-config --config-keys dsr1-fp8-mi355x-sglang-mtp --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml

github-actions · 2026-02-04T21:40:29Z

@cquil11 Kicking off a sweep.

Run: https://github.com/InferenceMAX/InferenceMAX/actions/runs/21689430926
Command: test-config --config-keys dsr1-fp8-mi355x-sglang-mtp --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml
Pinned ref: 2065877
Approval: not required (trusted collaborator).

benenzhu · 2026-02-05T02:40:04Z

/sweep test-config --config-keys dsr1-fp8-mi355x-sglang-mtp --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml

Oh, failed with some wired error at graph capture stage. I will convert it to draft and try fix locally then. Also need some tune for the use-chat-template config.
Thanks for your review.

benenzhu added 2 commits February 4, 2026 10:03

fix

3d9480a

benenzhu requested a review from a team as a code owner February 4, 2026 10:20

github-project-automation bot added this to InferenceMAX Board Feb 4, 2026

benenzhu added 2 commits February 4, 2026 10:21

fix docs

8f36c68

fix docs

660ba3e

cquil11 reviewed Feb 4, 2026

View reviewed changes

benchmarks/dsr1_fp8_mi355x_mtp.sh Show resolved Hide resolved

benenzhu and others added 2 commits February 4, 2026 15:17

fix add use-chat-template

7a78682

Merge branch 'main' into main

2065877

benenzhu marked this pull request as draft February 5, 2026 02:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Add MTP support for DeepSeek R1 FP8 MI355X SGLang#628

[AMD] Add MTP support for DeepSeek R1 FP8 MI355X SGLang#628
benenzhu wants to merge 6 commits intoInferenceMAX:mainfrom
benenzhu:main

benenzhu commented Feb 4, 2026

Uh oh!

Uh oh!

cquil11 commented Feb 4, 2026

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

cquil11 commented Feb 4, 2026

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

benenzhu commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

benenzhu commented Feb 4, 2026

Summary

Changes

Configuration

New SGLang Arguments

Uh oh!

Uh oh!

cquil11 commented Feb 4, 2026

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

cquil11 commented Feb 4, 2026

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

benenzhu commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants