Skip to content

Conversation

@mgoin
Copy link
Member

@mgoin mgoin commented Aug 27, 2025

Purpose

Retune the triton fp8 block dense gemm configs for modern triton. Also adds a simple benchmark script that doesn't tune.

Mostly improves performance for smaller M, but crucially gives improvement for N=576,K=7168

Test Plan

Test Result

H100

Screenshot 2025-08-27 at 5 30 16 PM Screenshot 2025-08-27 at 5 30 42 PM Screenshot 2025-08-27 at 5 30 49 PM

H200

Screenshot 2025-08-27 at 5 32 04 PM Screenshot 2025-08-27 at 5 32 08 PM Screenshot 2025-08-27 at 5 32 14 PM
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: mgoin <mgoin64@gmail.com>
@mergify mergify bot added the performance Performance-related issues label Aug 27, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces performance tuning configurations for the FP8 block-wise GEMM kernel on H100/H200 GPUs, specifically for matrix shapes found in the DeepSeek-V3 model. It includes a new benchmark script to compare the performance of the w8a8-block-fp8 kernel against the standard bfloat16 GEMM, along with numerous new and updated JSON files containing the tuned kernel parameters. The changes appear to be correct and well-aligned with the goal of improving performance. The new benchmark script is well-implemented, and the configuration files are consistent with the output of a tuning process. I have not identified any issues of high or critical severity.

Signed-off-by: mgoin <mgoin64@gmail.com>
Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025
@mgoin mgoin changed the title Tune configs for triton block fp8 gemm H100/H200 [Perf] Tune configs for triton block fp8 gemm H100/H200 Aug 27, 2025
@DarkLight1337 DarkLight1337 merged commit a781e84 into vllm-project:main Aug 28, 2025
56 checks passed
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants