[NV] DSR1 FP8 B200 Dynamo TRT (both MTP & STP)#616
Conversation
|
Claude finished @jthomson04's task in 1m 48s —— View job Review Summary1 blocking issue, 2 warnings found. 🔴 🟡 🟡 Everything else looks good — image is on NGC ( Task List
|
| conc-list: [8] | ||
| prefill: | ||
| num-worker: 1 | ||
| tp: 8 | ||
| ep: 8 | ||
| dp-attn: true | ||
| additional-settings: | ||
| # https://github.com/ishandhanani/srt-slurm/blob/sa-submission-q1-2026/recipes/trtllm/b200-fp8/8k1k/mtp/ctx1_gen6_tp8_batch8_eplb0_mtp3_8.yaml | ||
| - "CONFIG_FILE=recipes/trtllm/b200-fp8/8k1k/mtp/ctx1_gen6_tp8_batch8_eplb0_mtp3_8.yaml" | ||
| decode: | ||
| num-worker: 6 | ||
| tp: 8 | ||
| dp-attn: false | ||
| - spec-decoding: "mtp" | ||
| conc-list: [8] |
There was a problem hiding this comment.
🟡 WARNING: Duplicate conc-list: [8] in the 8k1k MTP low-latency section. Both entries target concurrency 8 but with different CONFIG_FILEs (ctx1_gen6_tp8_batch8 vs ctx1_gen2_tp8_batch32).
Why it matters: If the sweep system deduplicates by concurrency value, one of these configs will be silently dropped. If it doesn't, the results will collide with the same concurrency key.
Fix: Verify this is intentional. If both configs should be tested at concurrency 8, consider whether the system supports this. Otherwise, assign a different concurrency value to one of them.
| - conc-list: [128] | ||
| prefill: | ||
| num-worker: 1 | ||
| tp: 8 | ||
| ep: 8 | ||
| dp-attn: true | ||
| additional-settings: | ||
| # https://github.com/ishandhanani/srt-slurm/blob/sa-submission-q1-2026/recipes/trtllm/b200-fp8/8k1k/stp/ctx1_gen4_tp8_batch32_eplb0_mtp0_128.yaml | ||
| - "CONFIG_FILE=recipes/trtllm/b200-fp8/8k1k/stp/ctx1_gen4_tp8_batch32_eplb0_mtp0_128.yaml" | ||
| decode: | ||
| num-worker: 4 | ||
| tp: 8 | ||
| dp-attn: false | ||
| - conc-list: [96] | ||
| prefill: | ||
| num-worker: 1 | ||
| tp: 8 | ||
| ep: 8 | ||
| dp-attn: true | ||
| additional-settings: | ||
| # https://github.com/ishandhanani/srt-slurm/blob/sa-submission-q1-2026/recipes/trtllm/b200-fp8/8k1k/stp/ctx1_gen6_tp8_batch16_eplb0_mtp0_96.yaml | ||
| - "CONFIG_FILE=recipes/trtllm/b200-fp8/8k1k/stp/ctx1_gen6_tp8_batch16_eplb0_mtp0_96.yaml" | ||
| decode: | ||
| num-worker: 6 | ||
| tp: 8 | ||
| dp-attn: false | ||
| # Non-MTP (STP) configurations - High throughput (DP attention) | ||
| - conc-list: [128] | ||
| prefill: | ||
| num-worker: 1 | ||
| tp: 8 | ||
| ep: 8 | ||
| dp-attn: true | ||
| additional-settings: | ||
| # https://github.com/ishandhanani/srt-slurm/blob/sa-submission-q1-2026/recipes/trtllm/b200-fp8/8k1k/stp/ctx1_gen1_dep8_batch128_eplb0_mtp0_128.yaml | ||
| - "CONFIG_FILE=recipes/trtllm/b200-fp8/8k1k/stp/ctx1_gen1_dep8_batch128_eplb0_mtp0_128.yaml" | ||
| decode: | ||
| num-worker: 1 | ||
| tp: 8 | ||
| ep: 8 | ||
| dp-attn: true | ||
| - conc-list: [128] |
There was a problem hiding this comment.
🟡 WARNING: Three entries with conc-list: [128] in the 8k1k STP section — one at line 736 (low-latency/TP attention), one at line 763 (high-throughput/DP attention, batch128), and one at line 777 (high-throughput/DP attention, batch64).
Why it matters: Same concern as the duplicate conc-list: [8] — if results are keyed by concurrency, these will collide. The entries have different decode configs, but the concurrency value is identical.
Fix: Confirm the sweep system can disambiguate entries with the same concurrency value but different worker configurations. If not, adjust the concurrency values to be unique.
Signed-off-by: jthomson04 <jothomson@nvidia.com>
No description provided.