[NV] dsr1 fp8 b200 trt agg mtp update by camiloamoreno · Pull Request #632 · InferenceMAX/InferenceMAX

camiloamoreno · 2026-02-05T00:24:09Z

update to the latest TRTLLM 1.2 release container
fine-tune choice of parallelism in nvidia master (go to TP only for most points)
Enable latest optimizations offered by trtllm

For most of the tests we switch to the TRTLLM backend for best performance.

As in the non-mtp fp8 agg code, we use Piecewise Cuda Graphs (https://nvidia.github.io/TensorRT-LLM/features/torch_compile_and_piecewise_cuda_graph.html) which enables some components to execute thorugh cuda graphs while other components are run eagerly, to gain benefit with lower overhead. We prepare the yaml configuration as per the documentation including a "capture_num_tokens" list based partly on MAX_NUM_TOKENS. Though we still exclude a few narrow-concurrency scenarios for performance reasons, we are working to improve this and will update this config once that is done.

For some of the higher-concurrency points we use data-parallel attention, through the DEEPGEMM MOE backend. This backend requires a few different optimizations vs TRTLLM as can be seen in lines 33-43. Particularly the flag ENABLE_CONFIGURABLE_MOE enables DEEPGEMM to use the MOE backend from the latest 1.3 code tree for its improved communication performance.

Oseltamivir · 2026-02-05T01:45:23Z

Thanks for PR, please append the modifications to perf-changelog.yaml for a sweep.

b200-trt tag is removed

camiloamoreno added 2 commits February 4, 2026 15:43

update dsr1 fp8 b200 trt mtp configs

e65f6c1

Merge branch 'main' into nv/dsr1-fp8-b200-trt-agg-mtp-260203

a28d842

camiloamoreno requested review from kedarpotdar-nv and lishicheng1996-nv February 5, 2026 00:24

camiloamoreno requested a review from a team as a code owner February 5, 2026 00:24

camiloamoreno added the NVIDIA label Feb 5, 2026

github-project-automation bot added this to InferenceMAX Board Feb 5, 2026

camiloamoreno added 3 commits February 4, 2026 20:25

remove connditional

53c3621

Merge branch 'main' into nv/dsr1-fp8-b200-trt-agg-mtp-260203

a1edfa2

update perf-changelog

7408783

camiloamoreno changed the title ~~[WIP] [NV] dsr1 fp8 b200 trt agg mtp update~~ [NV] dsr1 fp8 b200 trt agg mtp update Feb 5, 2026

camiloamoreno added the sweep-enabled label Feb 5, 2026

kedarpotdar-nv and others added 8 commits February 4, 2026 23:01

Test with different runner tag

0a77ff1

b200-trt tag is removed

Merge branch 'main' into nv/dsr1-fp8-b200-trt-agg-mtp-260203

7cb875f

Update nvidia-master.yaml

c6b04ff

Update perf-changelog.yaml

7f38cbc

Update perf-changelog.yaml

22b8625

remove whitespace in yaml

ae5c997

update runner

11cbae2

add back b200-trt

3d5c7bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] dsr1 fp8 b200 trt agg mtp update#632

[NV] dsr1 fp8 b200 trt agg mtp update#632
camiloamoreno wants to merge 13 commits intomainfrom
nv/dsr1-fp8-b200-trt-agg-mtp-260203

camiloamoreno commented Feb 5, 2026 •

edited

Loading

Uh oh!

Oseltamivir commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

camiloamoreno commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Oseltamivir commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

camiloamoreno commented Feb 5, 2026 •

edited

Loading