-
Notifications
You must be signed in to change notification settings - Fork 2k
[TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 #7985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 #7985
Conversation
📝 WalkthroughWalkthroughAdds a Perf-Sanity test pathway to the Jenkins L0 pipeline with a new filter flag and parallel job orchestration. Introduces and updates perf-sanity scripts: Docker-run wrapper, benchmark runner with retries and reproduction scripts, result parser with filename-based extraction and dynamic metrics, and replaces config YAMLs (remove old, add l0_dgx_b200). Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Dev as Jenkins L0 Pipeline
participant PS as Perf-Sanity Stage
participant SH as benchmark-serve.sh
participant PY as run_benchmark_serve.py
participant Srv as TRT-LLM Server
participant Bmk as Benchmark Client
participant Pars as parse_benchmark_results.py
Dev->>PS: Launch stage (ONLY_PERF_SANITY_TEST or stageName contains "Perf-Sanity")
PS->>SH: docker run ... benchmark-serve.sh --config_file l0_dgx_b200.yaml ...
Note over SH: Validate dirs, build mounts, optional TRT-LLM install
SH->>PY: python run_benchmark_serve.py --config_file ...
PY->>PY: initialize_test_case_infos()
loop per test_case with retries
PY->>Srv: start server
PY->>PY: wait_for_server() and cleanup caches
alt server ready
loop per concurrency
PY->>Bmk: run benchmark
Bmk-->>PY: status/logs
PY->>PY: handle success/error, timeouts
end
PY->>Srv: terminate server
else server not ready
PY->>PY: mark failed and retry/abort
end
end
PY->>PY: generate_reproduction_scripts()
SH->>Pars: parse_benchmark_results.py --config_file ...
Pars-->>SH: CSV output
SH-->>PS: Exit code
PS-->>Dev: Stage result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
tests/scripts/perf-sanity/benchmark-serve.sh (1)
77-82: Fix variable expansion in file existence checkSingle quotes prevent ${bench_dir} expansion; the check never triggers.
Apply:
- if [[ -f '${bench_dir}/parse_benchmark_results.py' ]]; then + if [[ -f "${bench_dir}/parse_benchmark_results.py" ]]; thentests/scripts/perf-sanity/parse_benchmark_results.py (1)
1-1: Add NVIDIA Apache-2.0 header (2025) per repo guidelinesAll .py files must include the NVIDIA Apache-2.0 header.
Apply (below the shebang):
#!/usr/bin/env python3 +# +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License.tests/scripts/perf-sanity/run_benchmark_serve.py (2)
1-1: Add NVIDIA Apache-2.0 header (2025) per repo guidelinesAll .py files must include the NVIDIA Apache-2.0 header.
Apply (below the shebang):
#!/usr/bin/env python3 +# +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# http://www.apache.org/licenses/LICENSE-2.0 +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License.
657-666: Always pass --model as separate args to trtllm-serveAvoid relying on positional parsing; split flags/values as distinct argv elements.
Apply:
- serve_cmd = [ - "trtllm-serve", MODEL, "--backend", "pytorch", "--tp_size", + serve_cmd = [ + "trtllm-serve", "--model", model_identifier, "--backend", "pytorch", "--tp_size", str(test_case['tp']), "--ep_size", str(test_case['ep']), "--max_batch_size", str(test_case['max_batch_size']), "--max_num_tokens", str(test_case['max_num_tokens']), "--kv_cache_free_gpu_memory_fraction", str(test_case['free_gpu_mem_fraction']), "--extra_llm_api_options", config_path ]
🧹 Nitpick comments (5)
tests/scripts/perf-sanity/benchmark-serve.sh (1)
68-69: Quote PYTHONPATH assignment when injecting trtllm_dirPrevents issues with paths containing spaces/colons.
Apply:
- export PYTHONPATH=$trtllm_dir + export PYTHONPATH="$trtllm_dir"tests/scripts/perf-sanity/parse_benchmark_results.py (2)
194-197: Remove unnecessary f-stringsNo placeholders present; flagged by linters.
Apply:
- print(f"Successfully extracted configuration from filename") + print("Successfully extracted configuration from filename") @@ - print(f"Could not extract configuration from filename either") + print("Could not extract configuration from filename either")
186-189: Avoid bare except; catch specific exceptionsCatching Exception hides parsing errors and makes debugging harder.
Use (e.g.) FileNotFoundError, OSError, UnicodeDecodeError as appropriate around file reads. Same for extract_and_update_metrics.
tests/scripts/perf-sanity/run_benchmark_serve.py (2)
900-910: Reproduction script: use resolved id in benchmark --modelAlign benchmark invocation with fixed identifier.
Apply:
- --model {model_path} \\ + --model {model_identifier} \\
384-390: Consider avoiding shell=True where not necessaryThe cleanup commands can be replaced with subprocess.run([...]) for safer invocation.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
-
jenkins/L0_Test.groovy(7 hunks) -
tests/scripts/perf-sanity/benchmark-serve.sh(2 hunks) -
tests/scripts/perf-sanity/benchmark_config.yaml(0 hunks) -
tests/scripts/perf-sanity/l0_dgx_b200.yaml(1 hunks) -
tests/scripts/perf-sanity/parse_benchmark_results.py(8 hunks) -
tests/scripts/perf-sanity/run_benchmark_serve.py(15 hunks)
💤 Files with no reviewable changes (1)
- tests/scripts/perf-sanity/benchmark_config.yaml
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
tests/scripts/perf-sanity/parse_benchmark_results.pytests/scripts/perf-sanity/run_benchmark_serve.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
tests/scripts/perf-sanity/parse_benchmark_results.pytests/scripts/perf-sanity/run_benchmark_serve.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
tests/scripts/perf-sanity/parse_benchmark_results.pytests/scripts/perf-sanity/run_benchmark_serve.py
🧠 Learnings (1)
📚 Learning: 2025-08-20T15:04:42.885Z
Learnt from: dbari
PR: NVIDIA/TensorRT-LLM#7095
File: docker/Dockerfile.multi:168-168
Timestamp: 2025-08-20T15:04:42.885Z
Learning: In docker/Dockerfile.multi, wildcard COPY for benchmarks (${CPP_BUILD_DIR}/benchmarks/*Benchmark) is intentionally used instead of directory copy because the benchmarks directory contains various other build artifacts during C++ builds, and only specific benchmark executables should be copied to the final image.
Applied to files:
tests/scripts/perf-sanity/benchmark-serve.sh
🪛 Shellcheck (0.11.0)
tests/scripts/perf-sanity/benchmark-serve.sh
[warning] 55-55: Quote this to prevent word splitting.
(SC2046)
🪛 Ruff (0.13.1)
tests/scripts/perf-sanity/parse_benchmark_results.py
62-79: Consider moving this statement to an else block
(TRY300)
186-186: Do not catch blind exception: Exception
(BLE001)
194-194: f-string without any placeholders
Remove extraneous f prefix
(F541)
197-197: f-string without any placeholders
Remove extraneous f prefix
(F541)
213-213: Do not catch blind exception: Exception
(BLE001)
tests/scripts/perf-sanity/run_benchmark_serve.py
384-384: subprocess call with shell=True identified, security issue
(S602)
384-384: f-string without any placeholders
Remove extraneous f prefix
(F541)
387-387: subprocess call with shell=True identified, security issue
(S602)
387-387: f-string without any placeholders
Remove extraneous f prefix
(F541)
609-609: f-string without any placeholders
Remove extraneous f prefix
(F541)
615-615: Do not catch blind exception: Exception
(BLE001)
618-618: f-string without any placeholders
Remove extraneous f prefix
(F541)
716-716: Consider moving this statement to an else block
(TRY300)
718-718: Do not catch blind exception: Exception
(BLE001)
728-728: subprocess call with shell=True identified, security issue
(S602)
731-731: subprocess call with shell=True identified, security issue
(S602)
734-734: subprocess call with shell=True identified, security issue
(S602)
734-734: f-string without any placeholders
Remove extraneous f prefix
(F541)
737-737: subprocess call with shell=True identified, security issue
(S602)
737-737: f-string without any placeholders
Remove extraneous f prefix
(F541)
740-740: Do not catch blind exception: Exception
(BLE001)
773-773: Local variable gpu_info is assigned to but never used
Remove assignment to unused variable gpu_info
(F841)
921-925: f-string without any placeholders
Remove extraneous f prefix
(F541)
938-938: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
951-951: os.chmod setting a permissive mask 0o755 on file or directory
(S103)
956-956: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tests/scripts/perf-sanity/l0_dgx_b200.yaml (1)
1-17: Config looks consistent with runner expectations; please confirm intended GPU countYAML structure matches run_benchmark_serve.py (fields and concurrency_iterations). Confirm that gpus: 1 is intentional for the B200 sanity path (execution uses TP/EP; gpus is informational).
tests/scripts/perf-sanity/benchmark-serve.sh (2)
13-19: Validate config_file default/semanticsconfig_file defaults to
$(pwd) but later is used as a filename appended to bench_dir. Ensure callers pass a filename (e.g., l0_dgx_b200.yaml) or change the default to a sensible filename. Otherwise, --config_file will resolve to bench_dir/$ (pwd) (nonsense).
52-55: Quote mount aggregation in docker run to avoid word splitting (SC2046)Unquoted $mount can split on spaces and break -v args.
Apply:
- $mount \ + "$mount" \Likely an incorrect or invalid review comment.
jenkins/L0_Test.groovy (1)
2235-2239: Confirm GPU type for Perf-Sanity k8s podcreateKubernetesPodConfig(..., "b100-ts2", ...) for DGX_B200 Perf-Sanity looks intentional but differs from values[0] ("b200-x8"). Verify node label is correct.
|
/bot run --only-perf-sanity-test |
|
PR_Github #19931 Bot args parsing error: usage: /bot [-h] |
|
/bot run --only-perf-sanity-test |
|
PR_Github #19944 Bot args parsing error: usage: /bot [-h] |
099eb45 to
999a4c9
Compare
|
/bot run --stage-list "Perf-Sanity" |
|
PR_Github #21172 [ run ] triggered by Bot |
|
/bot run --stage-list "Perf-Sanity" |
|
PR_Github #21179 [ run ] triggered by Bot |
|
PR_Github #21172 [ run ] completed with state |
|
PR_Github #21179 [ run ] completed with state |
|
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1" |
|
PR_Github #21190 [ run ] triggered by Bot |
|
PR_Github #21190 [ run ] completed with state |
|
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1" |
|
PR_Github #21225 [ run ] triggered by Bot |
|
PR_Github #21225 [ run ] completed with state |
|
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-1,DGX_B200-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1" |
|
PR_Github #21284 [ run ] triggered by Bot |
|
PR_Github #21786 [ run ] completed with state |
9fc867a to
7e0c59e
Compare
|
/bot run |
|
PR_Github #21839 [ run ] triggered by Bot. Commit: |
|
PR_Github #21839 [ run ] completed with state |
|
/bot run |
|
PR_Github #21869 [ run ] triggered by Bot. Commit: |
|
PR_Github #21869 [ run ] completed with state |
7e0c59e to
7083622
Compare
|
/bot run |
|
PR_Github #21879 [ run ] triggered by Bot. Commit: |
|
PR_Github #21879 [ run ] completed with state |
chzblych
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved for the jenkins/L0_Test.groovy changes.
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
7083622 to
cf15a7a
Compare
|
/bot run |
|
PR_Github #21983 [ run ] triggered by Bot. Commit: |
|
PR_Github #21983 [ run ] completed with state |
|
/bot run |
|
PR_Github #22009 [ run ] triggered by Bot. Commit: |
|
PR_Github #22009 [ run ] completed with state |
…d B300 (NVIDIA#7985) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
…d B300 (NVIDIA#7985) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
…d B300 (NVIDIA#7985) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
…d B300 (NVIDIA#7985) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
…d B300 (NVIDIA#7985) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Description
This PR adds trtllm-serve + client perf test for TRTLLM. This PR only supports R1 FP4 models perf tests on 4xB200 and 4xB300.
You can define the perf tests in yaml file: ${LLM_ROOT}/tests/script/perf-sanity/l0_dgx_b300.yaml and then use pytest to launch test. If the yaml file is l0_dgx_b300.yaml, server name is r1_fp4_dep4 and client name is con1_iter1_1024_1024:
Currently the perf test system doesn't support history perf data database.
You can also trigger the test by
/bot run --stage-list "DGX_B200-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1,DGX_B300-4_GPUs-PyTorch-Perf-Sanity-Post-Merge-1".