Skip to content

[NV] DSR1 FP8 B200 Dynamo TRT (both MTP & STP)#616

Open
jthomson04 wants to merge 5 commits intomainfrom
nv/dsr1-fp8-b200-dynamo-trt
Open

[NV] DSR1 FP8 B200 Dynamo TRT (both MTP & STP)#616
jthomson04 wants to merge 5 commits intomainfrom
nv/dsr1-fp8-b200-dynamo-trt

Conversation

@jthomson04
Copy link
Collaborator

No description provided.

Signed-off-by: jthomson04 <jothomson@nvidia.com>
@jthomson04 jthomson04 requested a review from a team as a code owner February 2, 2026 19:50
@jthomson04 jthomson04 marked this pull request as draft February 2, 2026 19:52
@functionstackx functionstackx changed the title [NV] DSR1 FP8 B200 Dynamo TRT [NV] DSR1 FP8 B200 Dynamo TRT (both MTP & STP) Feb 5, 2026
Signed-off-by: jthomson04 <jothomson@nvidia.com>
@jthomson04 jthomson04 marked this pull request as ready for review February 6, 2026 01:07
@claude
Copy link
Contributor

claude bot commented Feb 6, 2026

Claude finished @jthomson04's task in 1m 48s —— View job


Review Summary

1 blocking issue, 2 warnings found.

🔴 perf-changelog.yaml: PR link references #647 but this is PR #616. Must be corrected before merge. (See comment)

🟡 nvidia-master.yaml (8k1k MTP): Two entries with conc-list: [8] at lines 607 and 621, each pointing to different CONFIG_FILEs (ctx1_gen6_tp8_batch8 vs ctx1_gen2_tp8_batch32). Verify the sweep system can handle duplicate concurrency values without result collision. (See comment)

🟡 nvidia-master.yaml (8k1k STP): Three entries with conc-list: [128] at lines 736, 763, and 777, spanning both low-latency and high-throughput sections with different worker/decode configs. Same collision concern. (See comment)

Everything else looks good — image is on NGC (nvcr.io), model prefix is valid (dsr1), enroot import pattern is present in the launch script, and the config structure is consistent with the existing FP4 B200 entries.


Task List

  • Read and analyze all changed files
  • Validate master config + perf-changelog consistency
  • Validate container image accessibility
  • Validate model prefixes
  • Validate launch script enroot import pattern
  • Check for bugs, logic errors, and other issues
  • Post final review

Signed-off-by: jthomson04 <jothomson@nvidia.com>
Comment on lines 607 to 621
conc-list: [8]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
# https://github.com/ishandhanani/srt-slurm/blob/sa-submission-q1-2026/recipes/trtllm/b200-fp8/8k1k/mtp/ctx1_gen6_tp8_batch8_eplb0_mtp3_8.yaml
- "CONFIG_FILE=recipes/trtllm/b200-fp8/8k1k/mtp/ctx1_gen6_tp8_batch8_eplb0_mtp3_8.yaml"
decode:
num-worker: 6
tp: 8
dp-attn: false
- spec-decoding: "mtp"
conc-list: [8]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING: Duplicate conc-list: [8] in the 8k1k MTP low-latency section. Both entries target concurrency 8 but with different CONFIG_FILEs (ctx1_gen6_tp8_batch8 vs ctx1_gen2_tp8_batch32).
Why it matters: If the sweep system deduplicates by concurrency value, one of these configs will be silently dropped. If it doesn't, the results will collide with the same concurrency key.
Fix: Verify this is intentional. If both configs should be tested at concurrency 8, consider whether the system supports this. Otherwise, assign a different concurrency value to one of them.

Comment on lines 736 to 777
- conc-list: [128]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
# https://github.com/ishandhanani/srt-slurm/blob/sa-submission-q1-2026/recipes/trtllm/b200-fp8/8k1k/stp/ctx1_gen4_tp8_batch32_eplb0_mtp0_128.yaml
- "CONFIG_FILE=recipes/trtllm/b200-fp8/8k1k/stp/ctx1_gen4_tp8_batch32_eplb0_mtp0_128.yaml"
decode:
num-worker: 4
tp: 8
dp-attn: false
- conc-list: [96]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
# https://github.com/ishandhanani/srt-slurm/blob/sa-submission-q1-2026/recipes/trtllm/b200-fp8/8k1k/stp/ctx1_gen6_tp8_batch16_eplb0_mtp0_96.yaml
- "CONFIG_FILE=recipes/trtllm/b200-fp8/8k1k/stp/ctx1_gen6_tp8_batch16_eplb0_mtp0_96.yaml"
decode:
num-worker: 6
tp: 8
dp-attn: false
# Non-MTP (STP) configurations - High throughput (DP attention)
- conc-list: [128]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
# https://github.com/ishandhanani/srt-slurm/blob/sa-submission-q1-2026/recipes/trtllm/b200-fp8/8k1k/stp/ctx1_gen1_dep8_batch128_eplb0_mtp0_128.yaml
- "CONFIG_FILE=recipes/trtllm/b200-fp8/8k1k/stp/ctx1_gen1_dep8_batch128_eplb0_mtp0_128.yaml"
decode:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
- conc-list: [128]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING: Three entries with conc-list: [128] in the 8k1k STP section — one at line 736 (low-latency/TP attention), one at line 763 (high-throughput/DP attention, batch128), and one at line 777 (high-throughput/DP attention, batch64).
Why it matters: Same concern as the duplicate conc-list: [8] — if results are keyed by concurrency, these will collide. The entries have different decode configs, but the concurrency value is identical.
Fix: Confirm the sweep system can disambiguate entries with the same concurrency value but different worker configurations. If not, adjust the concurrency values to be unique.

Signed-off-by: jthomson04 <jothomson@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants