-
Notifications
You must be signed in to change notification settings - Fork 2k
add initial v1.0 data without rtx 6000 blackwell se #7894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
📝 WalkthroughWalkthroughAdds new RTX 6000 Pro Blackwell Server Edition benchmarks and tables; expands FP4/FP8 model listings; replaces Llama 4 Maverick FP8 with Llama 4 Scout data; documents new ep_size/--ep expert-parallelism flag; updates example commands (TP/PP/EP); adds MoE YAML snippets; updates troubleshooting for KV cache and memory. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant D as Perf Docs
participant B as trtllm-bench
participant R as TRT-LLM Runtime
participant G as GPU(s)
U->>D: Read benchmark guidance and flags (TP/PP/EP)
Note over D: New flag ep_size / --ep for expert parallelism
U->>B: Run with --tp, --pp [--ep]
B->>R: Configure execution plan (TP/PP[/EP])
alt Dense model
R->>G: Launch kernels with TP/PP
else MoE model (EP enabled)
R->>G: Route tokens to experts (EP), apply dropout/padding
end
R-->>B: Throughput/latency results
B-->>U: Output benchmark tables/logs
Note over U,R: Troubleshoot KV cache % / memory if needed
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
docs/source/performance/perf-overview.md (3)
75-85: Fix MD058: add blank lines around tables.markdownlint flags these tables; add a blank line before and after each to satisfy MD058.
Apply diffs like (pattern shown for each block):
RTX 6000 Pro Blackwell Server Edition + | Sequence Length (ISL/OSL) | **4 GPUs**<br/>TP2,PP2 (FP4) | **8 GPUs**<br/>TP4,PP2 (FP4) | ... | 5000/500 | 2,662 | | +Repeat the same before/after the tables starting at the listed ranges.
Also applies to: 100-112, 127-138, 153-165, 191-202, 217-225
101-101: Grammar: use singular “1 GPU”.-| Sequence Length (ISL/OSL) | **1 GPUs**<br/>TP1,PP1 (FP4) | **2 GPUs**<br/>TP1,PP2 (FP4) | **4 GPUs**<br/>TP1,PP4 (FP4) | **8 GPUs**<br/>TP1,PP8 (FP4) | +| Sequence Length (ISL/OSL) | **1 GPU**<br/>TP1,PP1 (FP4) | **2 GPUs**<br/>TP1,PP2 (FP4) | **4 GPUs**<br/>TP1,PP4 (FP4) | **8 GPUs**<br/>TP1,PP8 (FP4) |
63-63: Define acronyms (ISL/OSL/TP/PP/EP) and clarify ep_size scope.
- Add a one‑liner early to decode acronyms; many tables use “ISL/OSL” before Variables are introduced.
- For
ep_size, briefly clarify it applies to MoE models only and how it composes with TP/PP.Proposed insertion (place near the top, after “Throughput Measurements” intro paragraph):
Note: ISL = Input Sequence Length, OSL = Output Sequence Length, TP = Tensor Parallelism, PP = Pipeline Parallelism, EP = Expert Parallelism (for MoE models).Also applies to: 268-269, 315-319, 343-347
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/performance/perf-overview.md(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
docs/source/performance/perf-overview.md
🪛 markdownlint-cli2 (0.18.1)
docs/source/performance/perf-overview.md
76-76: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
101-101: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
128-128: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
154-154: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
192-192: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
218-218: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (4)
docs/source/performance/perf-overview.md (4)
26-30: Confirm product naming and PR title intent for “RTX 6000 Pro Blackwell Server Edition”.
- The doc adds “RTX 6000 Pro Blackwell Server Edition” data, while the PR title says “without rtx 6000 blackwell se”. Please confirm whether RTX 6000 Blackwell SE content should be included in v1.0 and align the PR title and section headings accordingly.
- Also verify the official branding string (e.g., “NVIDIA RTX 6000 Blackwell”, “RTX 6000 Pro Blackwell”, “Server Edition”) and use it consistently across the Hardware list and tables.
Also applies to: 37-38, 75-75, 100-100, 127-127, 153-153, 191-191, 217-217
177-190: Scout vs Maverick: confirm coexistence or replacement.Doc adds “Llama 4 Scout” and still contains “Llama 4 Maverick.” If Scout supersedes Maverick for FP8, either:
- keep both with a short note explaining differences/use‑cases, or
- remove Maverick to avoid confusion.
Also applies to: 191-202
46-49: Model lists vs tables: ensure consistency.The FP4/FP8 model lists add Qwen3/DeepSeek entries; double‑check every listed model has corresponding tables (and vice versa), or add a brief note if some are intentionally omitted.
Also applies to: 58-59
351-351: Confirm YAML keyenable_attention_dp— verified.LLM args declare
enable_attention_dp(tensorrt_llm/llmapi/llm_args.py: ~218 and Field at ~1169 with description "Enable attention data parallel") and the flag is referenced across CLI, tests and runtime code — keep the doc key as-is.
It's a mirror PR of #7848
Summary by CodeRabbit