Add Flashinfer DeepGEMM SM90 for SwapAB Optimization #15514

b8zhong · 2025-12-20T05:54:38Z

Motivation

After flashinfer-ai/flashinfer#2131 in Flashinfer, we can benefit from SwapAB, where the input order is swapped to benefit when the M dimension is < 32 (e.g when BS < 32 in decoding). When it is larger, there is no benefit.

Modifications

(Requires Flashinfer nightly, and the backend currently only supports SM90)
Note that Flashinfer will compile it's own DeepGEMM. So it is separate from the DeepGEMM built in the Docker container.

Accuracy Tests

Benchmarking and Profiling

for ((N=1; N<=128; N*=2)); do
  python3 -m sglang.bench_serving \
    --backend sglang \
    --flush-cache \
    --dataset-name random \
    --random-input-len 1024 \
    --random-output-len 1024 \
    --random-range-ratio 1.0 \
    --num-prompts $((6*N)) \
    --max-concurrency $N \
    --output-file res.jsonl
done

We can see that when the M dimension is small, there is around a 5-8% E2E benefit

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

…sm90

gemini-code-assist · 2025-12-20T05:54:41Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

b8zhong added 3 commits December 18, 2025 10:08

more

af9f3c8

more

9d16495

Merge remote-tracking branch 'upstream/main' into brayden/add-swapab-…

efad674

…sm90

b8zhong requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg and ch-wan as code owners December 20, 2025 05:54

Fridge003 mentioned this pull request Dec 21, 2025

Update flashinfer version to 0.6.0 #15551

Draft

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Flashinfer DeepGEMM SM90 for SwapAB Optimization #15514

Add Flashinfer DeepGEMM SM90 for SwapAB Optimization #15514

b8zhong commented Dec 20, 2025

Uh oh!

gemini-code-assist bot commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add Flashinfer DeepGEMM SM90 for SwapAB Optimization #15514

Are you sure you want to change the base?

Add Flashinfer DeepGEMM SM90 for SwapAB Optimization #15514

Conversation

b8zhong commented Dec 20, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant