Fix a performance comparison issue in Benchmark Suite #23047

louie-tsai · 2025-08-17T06:00:14Z

Purpose

For different benchmark_results.json comparison. there is model mismatch issue in the comparison table.
this PR fixes the issue to have correct comparison among different benchmark_results.json

Without the fix, it might compare performance among different models

After the fix, it will only compare performance under the same model

Test Plan

Manually Test it.
python3 .buildkite/nightly-benchmarks/scripts/compare-json-results.py -f A/benchmark_results.json B/benchmark_results.json

Test Result

(Optional) Documentation Update

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

github-actions · 2025-08-17T06:00:22Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request significantly improves the benchmark comparison script by aligning results based on key columns instead of relying on row order, which makes the comparison much more robust. The new logic is well-structured and handles various edge cases. I have a few suggestions to further refine the implementation, focusing on using up-to-date library APIs, removing redundant code, and clarifying the usage of the comparison script in the CI workflow. Overall, this is a great enhancement.

.buildkite/nightly-benchmarks/scripts/compare-json-results.py

.buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh

.buildkite/nightly-benchmarks/scripts/compare-json-results.py

bigPYJ1151 · 2025-08-19T02:18:18Z

.buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh

Perhaps we can add an ENV for exporting comparison reports, likes EXPORT_COMPARISON. Looks like it is just needed when doing interactive benchmarking.

removed it from run script, and leave it for interactive performance comparison only.

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com>

Co-authored-by: Li, Jiang <bigpyj64@gmail.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com>

…3047) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>

…3047) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com> Signed-off-by: Duncan Moss <djm.moss@gmail.com>

…3047) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>

…3047) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

…3047) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>

mergify bot added ci/build performance Performance-related issues labels Aug 17, 2025

gemini-code-assist bot reviewed Aug 17, 2025

View reviewed changes

louie-tsai changed the title ~~[WIP]Fix a performance comparison issue in Benchmark Suite~~ Fix a performance comparison issue in Benchmark Suite Aug 19, 2025

louie-tsai force-pushed the benmark_suite_fixes branch from d84e507 to 493b0d0 Compare August 19, 2025 02:05

bigPYJ1151 reviewed Aug 19, 2025

View reviewed changes

louie-tsai force-pushed the benmark_suite_fixes branch 3 times, most recently from 32b3513 to 06360cf Compare August 20, 2025 01:25

louie-tsai requested a review from bigPYJ1151 August 20, 2025 01:29

louie-tsai and others added 5 commits August 19, 2025 18:30

fix an issue for comparing serving, latency, throughput together

1ffd59a

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

Fix the model mismatch in the tables

39826ff

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

error check

7578420

Signed-off-by: Tsai, Louie <louie.tsai@intel.com>

Update .buildkite/nightly-benchmarks/scripts/compare-json-results.py

8ec15c8

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com>

Update .buildkite/nightly-benchmarks/scripts/compare-json-results.py

04322bb

Co-authored-by: Li, Jiang <bigpyj64@gmail.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com>

louie-tsai force-pushed the benmark_suite_fixes branch from 06360cf to 04322bb Compare August 20, 2025 01:30

bigPYJ1151 approved these changes Aug 20, 2025

View reviewed changes

bigPYJ1151 enabled auto-merge (squash) August 20, 2025 02:23

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 20, 2025

bigPYJ1151 merged commit 941f568 into vllm-project:main Aug 20, 2025
24 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix a performance comparison issue in Benchmark Suite #23047

Fix a performance comparison issue in Benchmark Suite #23047

Uh oh!

louie-tsai commented Aug 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bigPYJ1151 Aug 19, 2025

Uh oh!

louie-tsai Aug 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Fix a performance comparison issue in Benchmark Suite #23047

Fix a performance comparison issue in Benchmark Suite #23047

Uh oh!

Conversation

louie-tsai commented Aug 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bigPYJ1151 Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

louie-tsai Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

louie-tsai commented Aug 17, 2025 •

edited by github-actions bot

Loading