[NV] dsr1 fp4 b200 trt agg mtp update by camiloamoreno · Pull Request #642 · InferenceMAX/InferenceMAX

camiloamoreno · 2026-02-05T01:34:59Z

This PR contains below updates:

Update to the latest TRTLLM 1.2 release container, the recent rc6.post3
Fine-tune choice of parallelism in nvidia-master (reduce overlapping TP8/TP4)
Enable Piecewise cuda graphs optimization on specific cases

Near the top of the benchmark script (L26-35), we enable specific optimizations, mainly differentiating between cases with and without DP attention, including for the choice of MTP aggressiveness.

As in the non-mtp fp4 agg version, we use Piecewise Cuda Graphs (https://nvidia.github.io/TensorRT-LLM/features/torch_compile_and_piecewise_cuda_graph.html) which enables some components to execute thorugh cuda graphs while other components are run eagerly, to gain benefit with lower overhead. We use the formula from the documentation to generate a capture_num_tokens list depending on MAX_NUM_TOKENS.

"cuda graph max batch size" is optimized to match CONC as a natural limit, but reduced to batch_size/4 when DP_ATTENTION is enabled.

claude · 2026-02-05T23:49:41Z

Claude finished @camiloamoreno's task in 1m 52s —— View job

PR Review: [NV] dsr1 fp4 b200 trt agg mtp update

Read changed files and gather context
Validate master config (model prefixes, image accessibility)
Validate perf-changelog presence
Validate benchmark script (code style, MTP requirements)
Check for bugs and logic errors
Post review findings

LGTM — no blocking issues found.

Validation checklist:

Model prefix: dsr1 — valid
Image: nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc6.post3 — public NGC image, valid
perf-changelog.yaml: Updated with matching PR link and description
--use-chat-template: Present in benchmark client (line 132)
Server launch formatting: Arguments on separate lines with \ continuations
Piecewise CUDA graphs logic: Capture token list generation correctly implements the documented formula; seq/modulo logic correctly avoids duplicates at the boundary
MAX_BATCH_SIZE: Correctly bounded to CONC/4 with DP attention, CONC otherwise; consistently used in both the YAML config and the server launch args

kedarpotdar-nv

lgtm. this should get picked up after current GPTOSS run is complete

update nvidia master and benchmark options

444c823

camiloamoreno added the NVIDIA label Feb 5, 2026

camiloamoreno requested a review from a team as a code owner February 5, 2026 01:35

github-project-automation bot added this to InferenceMAX Board Feb 5, 2026

camiloamoreno marked this pull request as draft February 5, 2026 04:52

camiloamoreno added 3 commits February 5, 2026 13:24

bugfix

9670ce5

perf changelog

f0fd116

Merge branch 'main' into nv/dsr1-fp4-b200-trt-agg-mtp-260204

f5f523c

camiloamoreno changed the title ~~[WIP] [NV] dsr1 fp4 b200 trt agg mtp update~~ [NV] dsr1 fp4 b200 trt agg mtp update Feb 5, 2026

camiloamoreno requested a review from kedarpotdar-nv February 5, 2026 23:46

camiloamoreno marked this pull request as ready for review February 5, 2026 23:49

camiloamoreno added the sweep-enabled label Feb 5, 2026

kedarpotdar-nv approved these changes Feb 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] dsr1 fp4 b200 trt agg mtp update#642

[NV] dsr1 fp4 b200 trt agg mtp update#642
camiloamoreno wants to merge 4 commits intomainfrom
nv/dsr1-fp4-b200-trt-agg-mtp-260204

camiloamoreno commented Feb 5, 2026 •

edited

Loading

Uh oh!

claude bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

kedarpotdar-nv left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

camiloamoreno commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: [NV] dsr1 fp4 b200 trt agg mtp update

Uh oh!

kedarpotdar-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

camiloamoreno commented Feb 5, 2026 •

edited

Loading

claude bot commented Feb 5, 2026 •

edited

Loading