-
Notifications
You must be signed in to change notification settings - Fork 2k
[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases #9356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases #9356
Conversation
📝 WalkthroughWalkthroughIntroduces a comprehensive disaggregated inference benchmarking framework including SLURM job submission, YAML configuration management, job execution orchestration, result parsing for performance and accuracy metrics, and an extensive test harness with parametrized test configurations across multiple hardware and model scenarios. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Runner (pytest)
participant Config as ConfigLoader
participant JobMgr as JobManager
participant Executor as SlurmExecutor
participant Parser as Result Parser
Test->>Config: scan_configs()
Config->>Config: load YAML, apply env overrides
Config-->>Test: TestConfig (with metrics_config)
Test->>JobMgr: submit_job(test_config)
JobMgr->>Executor: write config, sbatch
Executor-->>JobMgr: job_id
JobMgr->>JobMgr: wait_for_completion(job_id)
Note over JobMgr: poll sacct, check early failure
JobMgr-->>Test: completion status
Test->>JobMgr: check_result(job_id, test_config)
JobMgr->>Parser: parse logs with metrics_config
Parser-->>JobMgr: results (perf or accuracy)
JobMgr->>JobMgr: backup_logs, cleanup
JobMgr-->>Test: result dict
Test->>Test: assert results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention:
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
🧹 Nitpick comments (51)
tests/integration/defs/perf/disagg/pyproject.toml (2)
9-15: Standardize version constraint syntax across dependencies.Version constraints use inconsistent syntax: explicit bounds (lines 11, 12, 15) vs. caret notation (lines 13, 14). For maintainability, prefer explicit bounds consistently.
[tool.poetry.dependencies] python = ">=3.10" pytest = ">=8.4.2,<9.0.0" pandas = ">=2.3.2,<3.0.0" -psutil = "^7.1.0" -pyyaml = "^6.0.3" +psutil = ">=7.1.0,<8.0.0" +pyyaml = ">=6.0.3,<7.0.0" scipy = ">=1.11.0,<2.0.0"
5-5: Consider adding email to author metadata.Poetry conventions typically include author email:
authors = ["Fredric Zhu <email@example.com>"].tests/integration/defs/perf/disagg/test_configs/README.md (1)
7-12: Minor markdown formatting issue: missing language identifier for fenced code block.Line 7 starts a fenced code block without specifying a language identifier. Update to include the appropriate language for better rendering and linting compliance.
Apply this diff to fix the markdown linting issue:
-``` +``` test_configs/examples/disaggregated/slurm/benchmark/submit.sh (2)
106-124: Consider parameterizing sbatch arguments instead of hardcoding positional args.The sbatch call passes 27 positional arguments across multiple lines (106-124), making the mapping between
run_singleparameters and sbatch positional arguments fragile and difficult to maintain. A typo or reordering is error-prone.Consider refactoring to use a configuration file (YAML/JSON) or environment variables to pass parameters to the SLURM script instead of positional arguments. This would align with the new YAML-based configuration framework (referenced in the PR summary) and reduce maintenance burden.
Alternatively, document the parameter mapping clearly in a comment above
run_singleto aid future maintenance.
131-140: Hardcoded example configurations lack discoverability.Lines 131-140 define 10 hardcoded benchmark configurations via
run_singlecalls. These examples are not discoverable or filterable programmatically. If users need to run a subset (e.g., via-k filteringmentioned in the PR objective), they cannot easily do so from this script.Consider extracting these example configurations into a separate YAML/JSON file (or leveraging the existing configuration framework under
tests/integration/defs/perf/disagg/test_configs/) so that:
- Configurations can be listed, filtered, and selected programmatically.
- The script can load and iterate configurations from a central source.
- Alignment with the PR's goal to "Adds support for a test list and -k filtering."
The commented-out 8k-1k variants (lines 142-156) also suggest that configuration management should be more structured.
tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-UCX.yaml (1)
1-10: Consider adding dataset_file to metadata for consistency.Some configuration files in this PR include
dataset_filein the metadata section (e.g., the wideep variants), while others don't. For consistency across all test configurations, consider adding it here as well, since it's referenced in the benchmark section at Line 27.tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-UCX.yaml (2)
1-10: Consider adding dataset_file to metadata for consistency.This configuration is missing
dataset_filein the metadata section, while other configs in this PR include it. Adding it would improve consistency across test configurations.
24-24: Inconsistent concurrency_list format across configurations.This config uses space-separated values (
512 1075), while other configs use quoted strings (e.g.,'2150','1075'). Standardizing the format across all configurations would improve maintainability.tests/integration/defs/perf/disagg/test_configs/wideep/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb288_mtp3_ccb-UCX.yaml (1)
25-25: Inconsistent concurrency_list format across configurations.This config uses space-separated values (
512 1075), while other configs use quoted strings (e.g.,'2150','1075'). Standardizing the format across all configurations would improve maintainability.tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-UCX.yaml (1)
1-10: Consider adding dataset_file to metadata for consistency.This configuration is missing
dataset_filein the metadata section, while other configs in this PR (particularly the wideep variants) include it. Adding it would improve consistency across test configurations.tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-r1-fp4_8k1k_ctx2_gen1_dep32_bs128_eplb288_mtp3_ccb-DEFAULT.yaml (1)
58-75: Duplicate128entry in CUDA graph batch sizes
cuda_graph_config.batch_sizesincludes128twice (once in the main sequence and once at the end). It’s harmless but noisy and may confuse future readers or generators.You can simplify by dropping the duplicate:
- 768 - 1024 - - 2048 - - 128 + - 2048tests/integration/defs/perf/disagg/utils/config_validator.py (1)
18-33: Docstrings advertise ValueError/FileNotFoundError, but implementation only asserts
validate_test_configand the private_validate_*helpers documentValueError/FileNotFoundErrorin their Raises sections, but the current implementation only usesassert, which surfaces asAssertionError(and can be optimized out with-O).Either adjust the docs to reflect
AssertionErroronly, or convert the assertions into explicit exceptions, for example:- if mtp_size > 0: - assert gen_max_tokens == (gen_max_batch_size * (mtp_size + 1)), \ - "config error: gen_max_tokens != gen_max_batch_size * (mtp_size + 1)" + if mtp_size > 0 and gen_max_tokens != gen_max_batch_size * (mtp_size + 1): + raise AssertionError( + "config error: gen_max_tokens != gen_max_batch_size * (mtp_size + 1)" + )and similarly for the streaming and max‑seq‑len checks.
Also applies to: 45-84
tests/integration/defs/perf/disagg/utils/trackers.py (3)
39-54: Alignget_timestampstype hints with returned values
get_timestamps()is annotated as returningDict[str, str], buttotal_time__secis a float in both branches. This can confuse callers and type-checkers.Consider either changing the return type to something like
Dict[str, object]/Dict[str, Union[str, float]], or converting the duration to a string before returning.
75-95: Guard against misconfiguredOUTPUT_PATHbefore running session-collect
end_and_collect()assumesEnvManager.get_output_path()returns a real directory. If it’s still the default placeholder (starts with<) or otherwise invalid,log_filewill point to a non-existent path andrun_jobwill fail later with a less clear error.It would be more robust to:
- Check
not output_pathoroutput_path.startswith("<")and, in that case, log a clear configuration error and returnFalsewithout callingrun_job.- Optionally mirror that guard in
_update_csv_timestamps()so you never try to touch a CSV in an invalid location.This keeps failures deterministic and surfaces configuration issues earlier.
96-118: Narrow the catch-all exception in_update_csv_timestampsCatching bare
Exceptionaroundpd.read_csv/to_csvwill also swallow unexpected programming errors (schema issues, type errors, etc.), and only a generic message is logged.To improve diagnosability (and satisfy Ruff BLE001) consider:
- Catching only expected I/O-related exceptions (
OSError,pd.errors.EmptyDataError, etc.), and- Letting truly unexpected exceptions bubble up, or at least logging them with full context (CSV path,
repr(e)).This keeps the tracker resilient while avoiding over-broad error masking.
tests/integration/defs/perf/disagg/utils/logger.py (1)
108-136: Import-time auto-configuration is side-effectfulImporting
utils.loggerimmediately:
- Constructs the global
logger,- Calls
EnvManager.get_output_path(), potentially creating directories, and- Tries to attach a file handler inside a broad
except Exceptionblock.That’s convenient for the disagg harness, but if this module ever gets reused as a library it can surprise callers and complicate unit tests.
If you anticipate reuse, consider moving the EnvManager/OUTPUT_PATH detection behind an explicit initializer (e.g.,
init_logging()called from the test entrypoint) and keeping this module focused on logger construction. You could also narrow the bottom-levelexcept Exceptionto expected import/path errors to align with Ruff BLE001.tests/integration/defs/perf/disagg/reporting/accuracy_parser.py (1)
49-60: Improve log-file read error handling and exception scope
parse_and_validate()wraps the log-file read in a bareexcept Exceptionand encodes the message into the returned result, but doesn’t emit a structured log entry. This both triggers Ruff BLE001 and makes debugging harder.You can keep the external behavior while tightening things up by:
- Catching specific I/O/decoding issues (
OSError,UnicodeDecodeError, etc.), and- Logging the failure via
logger.error(...)before returning thesuccess=Falseresult.That way callers still get a clean
AccuracyValidationResult, and you gain better observability.tests/integration/defs/perf/disagg/simple_collect.py (3)
2-11: Docstring is out of sync with actual outputsThe top-level docstring lists four generated files (CSV + three
.txtfiles), but the script also writestrtllm_version.txtand includes it in the summary.Recommend updating the docstring to mention
trtllm_version.txtso users know to expect that artifact.
37-41: Avoid fully silentexcept Exception: passpatterns for diagnosticsIn
collect_system_info()andTextWriter, several blocks use broadexcept Exceptionwith either no logging or justpass. For a diagnostics script this keeps things resilient, but it also:
- Hides unexpected bugs (regex issues, parsing mistakes, I/O problems), and
- Leaves only “unknown” values with no indication of why.
A more informative pattern would be to:
- Catch narrower, expected failure modes (
FileNotFoundError,subprocess.CalledProcessError,re.error, etc.), and/or- Log a short DEBUG/INFO message when you fall back to
"unknown".That preserves robustness while improving debuggability and should also address Ruff’s BLE001/S110 concerns in these regions.
Also applies to: 77-82, 97-122, 209-210, 229-230
238-293: Consider reducing or gating the TensorRT-LLM retry delay
write_trtllm_version()can take up to ~70 seconds iftensorrt_llmisn’t importable or hangs: twosubprocess.run(..., timeout=30)calls plus a hard-coded 10-secondsleepbetween them. For environments where TRT-LLM isn’t installed, that’s a lot of latency for a non-essential text file.You might want to either:
- Lower the timeouts/sleep, or
- Make the second attempt conditional on an env flag (e.g.,
DISAGG_RETRY_TRTLLM=1), so default runs fail fast but more patient retries are opt-in.Behavior stays unchanged when the flag is set, while speeding up misconfigured or TRT-LLM-less runs.
examples/disaggregated/slurm/benchmark/gen_worker_config.py (2)
25-49: Docstring is out of sync with the actual function signatureThe
gen_config_filedocstring still documents parameters likeconfig_path,model_path,num_ctx_servers,worker_start_port, andserver_portthat are not in the current signature, and it doesn’t describe several real parameters (work_dir,ctx_free_gpu_memory_fraction,gen_gpu_memory_fraction,mtp_size,cache_transceiver_max_num_tokens, etc.).This is likely to confuse future callers trying to use this function programmatically.
Apply an update like:
def gen_config_file(work_dir: str, @@ - """ - Generate configuration YAML file for disaggregated inference. - - Args: - config_path: Path to save the config file - model_path: Path to the model - num_ctx_servers: Number of context servers - ctx_tp_size: Tensor parallel size for context servers - ctx_pp_size: Pipeline parallel size for context servers - ctx_batch_size: Batch size for context servers - ctx_max_num_tokens: Max number of tokens for context servers - ctx_max_seq_len: Max sequence length for context servers - ctx_free_gpu_memory_fraction: Free GPU memory fraction for context servers - ctx_enable_attention_dp: Enable attention DP for context servers - num_gen_servers: Number of generation servers - gen_tp_size: Tensor parallel size for generation servers - gen_pp_size: Pipeline parallel size for generation servers - gen_batch_size: Batch size for generation servers - gen_max_num_tokens: Max number of tokens for generation servers - gen_enable_attention_dp: Enable attention DP for generation servers - gen_gpu_memory_fraction: GPU memory fraction for generation servers - eplb_num_slots: Number of slots for eplb - worker_start_port: Start port for workers - server_port: Server port - """ + """ + Generate ctx/gen worker configuration YAML files for disaggregated inference. + + Args: + work_dir: Directory where `ctx_config.yaml` and `gen_config.yaml` will be written. + ctx_tp_size: Tensor parallel size for context workers. + ctx_pp_size: Pipeline parallel size for context workers. + ctx_batch_size: Max batch size for context workers. + ctx_max_num_tokens: Max number of tokens for context workers. + ctx_max_seq_len: Max sequence length for context workers. + ctx_free_gpu_memory_fraction: Fraction of GPU memory reserved for KV cache on ctx workers. + ctx_enable_attention_dp: Whether to enable attention data parallel on ctx workers. + gen_tp_size: Tensor parallel size for gen workers. + gen_pp_size: Pipeline parallel size for gen workers. + gen_batch_size: Max batch size for gen workers. + gen_max_num_tokens: Max number of tokens for gen workers. + gen_max_seq_len: Max sequence length for gen workers. + gen_enable_attention_dp: Whether to enable attention data parallel on gen workers. + gen_gpu_memory_fraction: Fraction of GPU memory reserved for KV cache on gen workers. + eplb_num_slots: Number of MOE load balancer slots (0 disables load balancer config). + mtp_size: Number of next‑N predict layers for MTP speculative decoding (0 disables). + cache_transceiver_max_num_tokens: Max tokens in cache transceiver buffer for both ctx/gen. + """
124-137: Clarifymoe_config.load_balancersemantics vs test config expectationsHere
gen_config['moe_config']['load_balancer']is set to the path ofmoe_load_balancer.yaml:moe_load_balancer_file = os.path.join(work_dir, "moe_load_balancer.yaml") ... gen_config['moe_config']['load_balancer'] = moe_load_balancer_fileIn contrast, the disagg test configs (e.g.
tests/integration/defs/perf/disagg/test_configs/...) storeload_balanceras an inline dict, andextract_config_fieldsintests/integration/defs/perf/disagg/utils/common.pyassumes:eplb_slots = ( config_data["worker_config"]["gen"] .get("moe_config", {}) .get("load_balancer", {}) .get("num_slots", 0) )So:
- For worker configs produced here, a string path is probably what the runtime expects.
- For test configs, a nested dict is expected for offline analysis.
To avoid accidental reuse of worker configs where a dict is required, consider:
- Documenting clearly that this script outputs worker‑side configs, not the top‑level disagg test YAML shape; or
- Using a different key name (e.g.
load_balancer_file) at worker level if the runtime allows.Please confirm that no code paths attempt to run
extract_config_fieldsor similar dict‑based access on the YAMLs generated by this script.tests/integration/defs/perf/disagg/compare_backends.py (2)
70-87: Slight cleanup: unusedbase_casefromgroupbykeyIn:
for (base_case, metric_type), group in grouped: ...
base_caseis never used inside the loop. This is harmless but flagged by Ruff (B007) and slightly noisy.You can either:
- Rename it to
_base_caseto acknowledge it’s intentionally unused, or- If you don’t plan to log it, unpack only
metric_type:for (_, metric_type), group in grouped: ...
145-343: HTML template: replace fullwidth parentheses to satisfy linters and avoid ambiguityThe HTML template line:
<li>❌ <strong>Fail</strong>: DEFAULT is slower than UCX{threshold}%(Performance degradation)</li>uses fullwidth parentheses
(and), which Ruff flags as ambiguous (RUF001) and can cause subtle issues in some environments.Change them to standard ASCII parentheses:
- <li>❌ <strong>Fail</strong>: DEFAULT is slower than UCX{threshold}%(Performance degradation)</li> + <li>❌ <strong>Fail</strong>: DEFAULT is slower than UCX{threshold}% (Performance degradation)</li>No behavior change, just clearer source text and cleaner lint output.
tests/integration/defs/perf/disagg/reporting/report.py (4)
16-28: Avoid catching bareExceptionunless you intend to swallow all errors
LogWriter.print_to_consolealready handlesFileNotFoundErrorandPermissionError, then falls back to:except Exception as e: logger.error(f"Error reading file: {e}")Catching all exceptions without re‑raising can hide programming errors (e.g., encoding issues, interrupted system calls) and makes debugging harder.
Either:
- Narrow the catch to specific expected exceptions, or
- Log and re‑raise for unexpected ones:
except OSError as e: logger.error(f"Error reading file: {log_file_name}: {e}") raiseor, if you truly want to swallow everything, add a brief comment documenting that intention.
135-139: Make GPU type lookup robust to unexpectedGPU_TYPEvaluesCurrently:
gpu_type = EnvManager.get_gpu_type() gpu_config = GPU_RESOURCE_CONFIG[gpu_type] lock_freq_graphics = gpu_config.get("lock_freq_graphics_mhz", 0) or 0 lock_freq_memory = gpu_config.get("lock_freq_memory_mhz", 0) or 0If
GPU_TYPEis set to a value not present inGPU_RESOURCE_CONFIG, this will raiseKeyErrorand break log parsing, even though the lock frequencies are only used as metadata.Consider a safe default:
gpu_type = EnvManager.get_gpu_type() gpu_config = GPU_RESOURCE_CONFIG.get(gpu_type, {}) lock_freq_graphics = gpu_config.get("lock_freq_graphics_mhz", 0) or 0 lock_freq_memory = gpu_config.get("lock_freq_memory_mhz", 0) or 0Optionally also log a warning when the GPU type is unknown.
219-239:_get_network_namebehavior doesn’t align with documented input formatDocstring says
_get_network_nameexpects something like:test_disagg_simple.py::TestDisaggBenchmark::test_benchmark[deepseek-r1_1k1k_...]-con-1and extracts
deepseek-r1_1k1k_...-con-1via:pattern = r"\[([^\]]+)\](-con-\d+)"But
_convert_to_perf_result_formatcurrently calls it with:base_test_name = f"{test_prefix}_con:{concurrency}" network_name = self._get_network_name(base_test_name)i.e., no
[ ... ]section and"_con:{concurrency}"instead of"-con-1", so the regex will usually fail and you always fall back tobase_test_name.replace("/", "-").This is likely either:
- A stale docstring, or
- A missed update where
base_test_nameshould still carry the original pytest-style name.Decide which representation you want and make them consistent. For example, if you want the short
deepseek-r1_...-con-1form for readability:
- Ensure
test_prefixis the full pytest id with[...]section, and- Change the suffix construction to
-con-{concurrency}to match the regex,or, if the current
test_prefixis already what you want, update the regex/docstring and simplify_get_network_nameaccordingly.
247-272: ResultSaver docstring no longer matches behaviorDocstring:
"""All of the benchmarks append to the same csv, add header to it each time. No matter whether the columns are of the same count. """Implementation:
file_exists = os.path.exists(self.output_path) and os.path.getsize(self.output_path) > 0 if file_exists: df.to_csv(..., header=False) else: df.to_csv(..., header=True)So the header is only written on the first write, which is the correct behavior for a unified CSV but contradicts the “add header to it each time” wording.
Update the docstring to match the actual semantics, e.g., “append to the same CSV; write header only on first write.”
tests/integration/defs/perf/disagg/utils/common.py (2)
99-121: Consider skipping placeholder paths when buildingcontainer_mount
get_container_mountuses several EnvManager getters whose defaults are placeholder strings:work_dir = EnvManager.get_work_dir() # "<Your working directory>" script_dir = EnvManager.get_script_dir() # "<Your benchmark script directory>" model_dir = EnvManager.get_model_dir() # "<Your model and dataset directory>" output_path = EnvManager.get_output_path() # special-cased for directory creation repo_dir = EnvManager.get_repo_dir() # "<Your TensorRT-LLM repository directory>" trtllm_wheel_path = EnvManager.get_trtllm_wheel_path() ... mounts = [ f"{work_dir}:{work_dir}", f"{script_dir}:{script_dir}", f"{model_dir}:{model_dir}", f"{output_path}:{output_path}", ] if repo_dir: mounts.append(f"{repo_dir}:{repo_dir}") ...If users forget to set these env vars, you end up with mount strings containing literal placeholders (e.g.
"<Your working directory>:<Your working directory>"), which will likely cause container launch failures that are harder to interpret.You already treat placeholder specially in
get_output_path; you could apply a similar guard here, e.g.:def _is_placeholder(path: str) -> bool: return path.startswith("<") and path.endswith(">") ... for path in (work_dir, script_dir, model_dir, output_path): if path and not _is_placeholder(path): mounts.append(f"{path}:{path}") ... if repo_dir and not _is_placeholder(repo_dir): mounts.append(f"{repo_dir}:{repo_dir}")This keeps default configs usable in tests while failing more clearly when required env vars are missing in real runs.
133-194:extract_config_fieldsis tightly coupled to current config schema; document assumptions
extract_config_fieldsindexes deeply intoconfig_datawith hardcoded keys:isl = config_data["benchmark"]["input_length"] osl = config_data["benchmark"]["output_length"] ctx_num = config_data["hardware"]["num_ctx_servers"] gen_num = config_data["hardware"]["num_gen_servers"] ... gen_tp_size = config_data["worker_config"]["gen"]["tensor_parallel_size"] gen_batch_size = config_data["worker_config"]["gen"]["max_batch_size"] ... cache_transceiver_backend = config_data["worker_config"]["gen"]["cache_transceiver_config"]["backend"] ...This is fine for the curated disagg perf configs you’re adding now, but:
- Any missing key (e.g., an optional
speculative_configblock, or a future rename in worker_config) will raiseKeyErrorand break tooling.- The function is implicitly defining the “required schema” for disagg configs, but that’s not obvious from the call sites.
At minimum, consider:
- Documenting clearly in the docstring which fields are required and which are optional.
- Using
.get(..., default)for fields that are truly optional (e.g., speculative_config, certain backends), keeping[]indexing only for must‑have fields.This will make future schema evolutions (new backends, new fields) less brittle while preserving strong validation where you actually depend on the field.
tests/integration/defs/perf/disagg/scripts/rename_configs.py (3)
14-25: Handle invalid/empty YAML more defensively
yaml.safe_loadcan returnNoneor a non-dict if a file is empty or malformed, in which caseconfig.get(...)will raiseAttributeErrorwith a less-informative message.You can fail fast with a clearer error:
- with open(yaml_path, 'r') as f: - config = yaml.safe_load(f) + with open(yaml_path, "r") as f: + config = yaml.safe_load(f) + + if not isinstance(config, dict): + raise ValueError(f"YAML config must be a mapping, got {type(config).__name__} for {yaml_path}")This keeps the script robust while still surfacing bad configs clearly.
101-103: Narrow exception handling and address minor lint issuesBoth the per-file processing and the rename loop use broad
except Exception, which can hide programming errors and makes debugging harder. The Ruff hints here are reasonable:
- For parsing/loading: catch
yaml.YAMLErrorandOSErrorinstead of bareException.- For rename operations: catch
OSError/PermissionErrorrather than all exceptions.You can also drop the unused
f-string at line 128:- print(f"\nRenaming complete!") + print("\nRenaming complete!")These tweaks improve debuggability and satisfy the linter without changing behavior.
Also applies to: 123-128
1-1: Shebang vs executable bitThe file has a shebang but will typically be invoked via
python rename_configs.py. Either mark it executable in the repo or drop the shebang to silence EXE001; functionally it's fine as-is.tests/integration/defs/perf/disagg/test_disagg.py (1)
112-117: Preferpytest.failoverassert Falseand preserve original tracebacksA few places use
assert Falsefor control-flow failures andraise einsideexceptblocks:
- Lines 114–116, 190–192:
assert False, "..."- Lines 128–130, 205–207:
except Exception as e: ... raise eFor tests:
assert Falseis removed underpython -Oand is less explicit thanpytest.fail.raise ediscards the original traceback;raisepreserves it.A more idiomatic pytest style would be:
@@ - if error_msg == "timeout": - assert False, f"Job execution timeout after 7200s: {job_id}" - else: - assert False, f"Job failed early: {error_msg} (job_id: {job_id})" + if error_msg == "timeout": + pytest.fail(f"Job execution timeout after 7200s: {job_id}") + else: + pytest.fail(f"Job failed early: {error_msg} (job_id: {job_id})") @@ - except Exception as e: - test_tracker.end_test_case() - raise e + except Exception: + test_tracker.end_test_case() + raise @@ - if error_msg == "timeout": - assert False, f"Accuracy test timeout after 10800s: {job_id}" - else: - assert False, f"Accuracy test failed early: {error_msg} (job_id: {job_id})" + if error_msg == "timeout": + pytest.fail(f"Accuracy test timeout after 10800s: {job_id}") + else: + pytest.fail(f"Accuracy test failed early: {error_msg} (job_id: {job_id})") @@ - except Exception as e: - test_tracker.end_test_case() - raise e + except Exception: + test_tracker.end_test_case() + raiseThis keeps failures explicit and stack traces intact, and aligns with Ruff’s B011/TRY201 suggestions.
Also applies to: 128-130, 188-192, 205-207
tests/integration/defs/perf/disagg/reporting/accuracy_validator.py (1)
204-210: Tidy docs and type hints in validator/threshold classesMinor polish items:
HypothesisTestValidator.validate’s docstring still mentions anexpected_valueparameter that no longer exists.DatasetThreshold._get_hypothesis_paramsusesDict[str, any];anyis the built-in function, nottyping.Any.Suggested tweaks:
- def validate(self, actual_value: float) -> tuple[bool, str]: + def validate(self, actual_value: float) -> tuple[bool, str]: @@ - Args: - actual_value: Actual accuracy value from test - expected_value: Expected accuracy value (for display consistency) + Args: + actual_value: Actual accuracy value from test @@ - def _get_hypothesis_params(self) -> Dict[str, any]: + def _get_hypothesis_params(self) -> Dict[str, Any]:You’d also need
from typing import Anyat the top if not already present.Also applies to: 255-276
tests/integration/defs/perf/disagg/README.md (3)
291-576: Align README examples and file names with the current implementationThe “Core Implementation Code” sections still reference older module/file names and call patterns (e.g.,
config_loader.pyin the current directory,test_disagg_yaml.py,disagg_executor.py,disagg_report.py,list_configs.py), but the actual code in this PR lives under:
tests/integration/defs/perf/disagg/utils/config_loader.pytests/integration/defs/perf/disagg/test_disagg.pytests/integration/defs/perf/disagg/execution/executor.py- (and any real
list_configstooling in this tree, if present)This can be confusing for anyone trying to follow the README to run or extend the tests.
It would help to:
- Update the file names and imports in the code snippets to match the actual module layout and APIs.
- Clearly label any legacy examples as historical context if you intend to keep them.
- Ensure the
pytestinvocation examples usetest_disagg.py(and any new test entrypoints) rather thantest_disagg_yaml.py.That keeps the documentation in sync with the current design and avoids sending readers to non-existent modules.
Also applies to: 578-796, 1030-1058
8-13: Clarify “filename vs YAML metadata” as the single source of truthThe README currently sends mixed signals:
- Early on (Lines 8–13) it promotes “Filename as metadata: Parse model and benchmark type from filename, no YAML metadata needed.”
- Later (Lines 1041–1045, 1239–1247) it emphasizes “Configuration as Data” and says filenames are only for human readability, with
model_name,benchmark_type, etc. taken from YAMLmetadataandsequence.The implementation (e.g.,
TestConfig+ConfigLoader) now clearly readsmodel_name,benchmark_type, and GPU support from YAML content, not filenames.To avoid confusion, consider:
- Removing or softening the “Filename as metadata” claim.
- Explicitly stating that filenames are purely for humans and that YAML content is the authoritative source for
model_name,benchmark_type,precision,supported_gpus, etc.This will help future config authors know where they need to put truth.
Also applies to: 1041-1045, 1239-1247
19-33: Optional: Address markdownlint warnings (code fences, emphasis as headings)Low-priority but easy cleanups if you want markdownlint to pass:
- Add languages to fenced code blocks, e.g.
```bash,```yaml,```pythoninstead of bare```(e.g., the directory tree and decision tree blocks).- Replace emphasized lines used as headings (e.g.,
**Design Philosophy**) with actual heading syntax (### Design Philosophy) rather than MD036-style emphasis-as-heading.These don’t affect readers much but will keep tooling quieter.
Also applies to: 1215-1226, 1239-1248
tests/integration/defs/perf/disagg/utils/config_loader.py (2)
443-472: Double-checkenvironment.work_diroverride sourceIn
_apply_env_overrides,environment.work_diris populated fromEnvManager.get_script_dir():("environment", "work_dir"): lambda: EnvManager.get_script_dir(),Given
EnvManagerprovides both:
get_script_dir()– benchmark script directoryget_work_dir()– working directoryand the YAML examples use
environment.work_dir: <work_dir>, it seems more natural forwork_dirto come fromEnvManager.get_work_dir().If the intention really is to point
work_dirat the script directory inside the container, that’s fine; otherwise consider:- ("environment", "work_dir"): lambda: EnvManager.get_script_dir(), + ("environment", "work_dir"): lambda: EnvManager.get_work_dir(),to keep naming consistent across env vars,
EnvManager, and YAML.
246-248: Optional: Narrow broad exception handling in loader and writerTwo places catch bare
Exception:
- Loading configs in
scan_configs(lines 246–248).- Writing configs in
_write_config_file(lines 525–538).Catching
Exceptionhere keeps the test harness resilient to bad YAML or I/O issues, but it also hides programming errors (e.g.,KeyError,TypeError) that you might prefer to surface during development.If you want a tighter failure mode, you could:
- except Exception as e: - logger.warning(f"Failed to load {yaml_file}: {e}") + except (yaml.YAMLError, OSError, ValueError) as e: + logger.warning(f"Failed to load {yaml_file}: {e}")and:
- except Exception as e: - logger.warning(f"Failed to write config file {yaml_path}: {e}") + except OSError as e: + logger.warning(f"Failed to write config file {yaml_path}: {e}")This still handles expected error modes without masking unrelated bugs.
Also applies to: 525-538
tests/integration/defs/perf/disagg/execution/executor.py (10)
36-79: srun prefix construction looks solid; consider validating container imageReusing
GPU_RESOURCE_CONFIGandEnvManagermakes thesrunprefix consistent with the rest of the Slurm tooling and looks correct. One minor improvement is to fail fast ifEnvManager.get_container_image()is empty, instead of passing--container-image=to Slurm, which can cause confusing runtime errors.
81-141: Shell command construction could quote env-derived paths for robustness
build_script_commandinterpolateswork_dir,output_path,repo_dir, andtrtllm_wheel_pathdirectly intobash -cstrings. If these env vars ever contain spaces or shell metacharacters, this can break the command or open you up to accidental shell injection.Consider using
shlex.quotearound these values when building the command strings, e.g.:+from shlex import quote ... - f"cd {work_dir} && python3 {work_dir}/simple_collect.py {output_path}", + f"cd {quote(work_dir)} && python3 {quote(work_dir)}/simple_collect.py {quote(output_path)}",and similarly for the wheel/source branches.
143-181: Tighten typing and exception handling inrun_jobTwo small points here:
- The signature uses an implicit Optional:
log_file: str = None. PreferOptional[str](orstr | Nonein 3.10+) to satisfy type checkers and Ruff (RUF013).- The broad
except Exceptioncollapses timeouts, non-zero return codes, and internal errors into the same generic message. Narrowing this tosubprocess.TimeoutExpired/subprocess.CalledProcessErrorplus a final catch-all would preserve more signal and make debugging failed jobs easier.These are quality-of-life improvements; the overall control flow is correct.
192-258: Job submission flow and temp config lifecycle look correct; minor nits onlyThe
submit_jobimplementation correctly:
- Writes the rendered YAML to
test_config.temp_config_path.- Invokes
submit.pywith a clear command.- Parses the Slurm job id and cleans up the temp config on failure or exception.
Two minor nits:
- The inner
import reis redundant sincereis already imported at the module level.- If Slurm output format changes (e.g., localized message), the
"Submitted batch job"string match may fail silently; logging the full output on parse failure (you already log it asOutput:) is good, so behavior is acceptable.No functional blockers here.
259-338: Backup/archival behavior is careful and defensive
backup_logsdoes a full copy of the result dir, appends an_ERRORsuffix on failure, adds the Slurm log, and moves or falls back to copying the config. Error handling and cleanup of the temp config on backup failure are good.Only minor consideration:
shutil.copytreefor large result directories can be expensive; if this becomes an issue, you might want to support a configurable toggle or symlink-based backups, but for test infra this is fine.
384-433: Avoid shadowingcheck_resultname for clarityThe local variable in
check_result:check_result = JobManager._check_job_result(...) ... return check_resultshadows the static method name
JobManager.check_result. Not a bug, but slightly confusing when reading or debugging.Renaming the local (e.g.,
result = JobManager._check_job_result(...)) would improve readability.
435-489: Early-failure checker works, but job_id is unused and innertry/exceptis very broadThe log scanning logic for
output_gen_*.log/output_ctx_*.logand the patterns you’re matching look good.Two small cleanups:
job_idisn’t used incheck_for_early_failure; either log it in warnings or drop it from the signature to avoid confusion.- The inner
try/except Exception: pass(Lines [481-483]) suppresses all errors including programming mistakes. Since you already have an outerexceptthat logs, consider at least logging the inner failure too or restricting the exception type (e.g.,OSError).
715-765: Perf result handling is correct but could surface parse failures more explicitly
_check_perf_resultcorrectly:
- Delegates parsing to
LogParser.- Writes results to a single CSV via
ResultSaver.- Marks
successonly when a non-None DataFrame is produced.Two optional improvements:
- When
parse_result["status"]isFalseordfisNone, you currently just return the default{"status": "UNKNOWN", "success": False}; if you extendLogParser.parseto return an error string, wiring that intoresult["error"]here would improve debuggability.EnvManager.get_output_path()already conditionally creates the directory when it’s not a placeholder. IfOUTPUT_PATHis left at the default placeholder, your extraos.makedirs(output_path, exist_ok=True)will now create a literal<The csv ...>directory. Mirroring the same “not a placeholder” check here would keep that behavior consistent.Neither blocks correctness, but both would make failures easier to understand and behavior more predictable when env vars are unset.
768-823: Category routing in_check_job_resultis clear; consider guarding unexpected categoriesRouting between accuracy and perf checks based on
test_category == "accuracy"is simple and works, with a sane default to perf for everything else.If you expect only
"perf"and"accuracy", you might want to validatetest_categoryand log or raise on unexpected values instead of silently treating them as perf. That would catch misconfigured tests early.
178-180: Broadexcept Exceptionusage is pervasive; consider narrowing where practicalAcross several places (e.g.,
run_job,submit_job,backup_logs,cleanup_result_dir,check_for_early_failure,check_job_status,cancel_job), you use bareexcept Exception:blocks. For a test harness this is sometimes acceptable, but it does mask programming errors and makes static-analysis tools unhappy.Where feasible, prefer:
- Specific exceptions (
OSError,subprocess.TimeoutExpired,subprocess.CalledProcessError,yaml.YAMLError, etc.).- A final broad catch that at least logs the full stack trace for truly unexpected errors.
You don’t need to change all of them immediately, but tightening the most frequently-hit paths will improve diagnosability.
Also applies to: 247-257, 327-337, 354-356, 485-487, 507-509, 597-599
tests/integration/defs/perf/disagg/test_configs/wideep/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep32_bs16_eplb288_mtp3_ccb-NIXL.yaml (1)
14-18: Document required template placeholders.This YAML configuration uses multiple unresolved placeholders (
<partition>,<account>,<container_mount>,<container_image>,<model_path>,<full_path_to_work_dir>,<dataset_file>) that must be substituted before use. Add a comment block at the top of the file or in adjacent documentation describing:
- Which placeholders are required vs. optional
- Expected value formats for each placeholder
- Example substitution for reference
Consider adding a header comment:
+# YAML Configuration Template for Qwen3-235B-A22B-FP4 Disaggregated Inference Test +# +# Required placeholders (must be substituted before use): +# <partition> - SLURM partition name (e.g., "gpu_cluster") +# <account> - SLURM account/project (e.g., "ml_team") +# <container_mount> - Host path for container mount (e.g., "/path/to/mount") +# <container_image> - Container image URI (e.g., "docker.io/nvidia/pytorch:latest") +# <model_path> - Path to model weights (e.g., "/models/Qwen3-235B-A22B-FP4") +# <full_path_to_work_dir> - Working directory for job (e.g., "/workspace/runs/test_20251121") +# <dataset_file> - Dataset file path (e.g., "datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json") +# metadata: model_name: Qwen3-235B-A22B-FP4
...configs/wideep/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep32_bs16_eplb288_mtp3_ccb-NIXL.yaml
Show resolved
Hide resolved
...configs/wideep/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep32_bs16_eplb288_mtp3_ccb-NIXL.yaml
Show resolved
Hide resolved
...configs/wideep/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep32_bs16_eplb288_mtp3_ccb-NIXL.yaml
Show resolved
Hide resolved
.../test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yaml
Show resolved
Hide resolved
|
/bot run |
|
PR_Github #25496 [ run ] triggered by Bot. Commit: |
|
PR_Github #25496 [ run ] completed with state |
5224dc9 to
7626a71
Compare
|
/bot run |
|
PR_Github #25534 [ run ] triggered by Bot. Commit: |
|
PR_Github #25534 [ run ] completed with state |
7626a71 to
bd6265e
Compare
|
/bot run |
1 similar comment
|
/bot run |
|
PR_Github #25569 [ run ] triggered by Bot. Commit: |
|
PR_Github #25569 [ run ] completed with state |
|
/bot run |
|
PR_Github #25612 [ run ] triggered by Bot. Commit: |
|
PR_Github #25612 [ run ] completed with state |
|
/bot run |
|
PR_Github #25664 [ run ] triggered by Bot. Commit: |
bd6265e to
73ce72e
Compare
|
PR_Github #25664 [ run ] completed with state |
|
/bot run |
|
PR_Github #25689 [ run ] triggered by Bot. Commit: |
|
PR_Github #25689 [ run ] completed with state |
…ases Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
73ce72e to
3d7ea06
Compare
|
/bot reuse last-pipeline |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot reuse-pipeline |
|
PR_Github #25803 [ reuse-pipeline ] triggered by Bot. Commit: |
|
PR_Github #25803 [ reuse-pipeline ] completed with state |
…VIDIA#8779) The performance results of some kernels could be easily affected by the warm/cold L2 cache status. To achieve more precise profiling results, the L2 cache is cleared for every execution by the circular buffer method for better benchmarking during autotuning. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> [None][infra] Waive failed cases for main branch on 11/25 (NVIDIA#9429) Signed-off-by: qqiao <qqiao@nvidia.com> [NVIDIA#8391][chore] test_perf.py to lock clocks read from gpu_configs.yml instead of max freq (NVIDIA#9409) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com> [None][ci] Move more test stages to use OCI machines (NVIDIA#9395) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Matt Lefebvre <matthewelefebvre@gmail.com> [None][feat] Improve TRTLLM MoE in small hidden size throughput cases (NVIDIA#9377) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com> [https://nvbugs/5537996][fix] Let KV cache manager block initialization be aware whether it is doing a dry run or not (NVIDIA#9093) Before this commit, the kv cache manager does the same regardless, which causes a mis-calculation in free memory available to allocate for the KV cache manager, hence causing a crash. This commit fixes this by letting KV cache manager initialization be aware whether it is doing the dry run or not. If it is a dry run, use the max_tokens setting that is already pre-calculated and filled into kv_cache_config.max_tokens. Signed-off-by: eopXD <yuehtingc@nvidia.com> [https://nvbugs/5667922][fix] Update long context evaluation config (NVIDIA#9426) Signed-off-by: mni <125171826+baize97@users.noreply.github.com> [None][fix] Mitigate test timeout issues (NVIDIA#9445) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> [None][chore] Fix trtllm-eval for PyTorchLLM (NVIDIA#9427) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> [None][feat] Add a parser to layer-wise benchmarks (NVIDIA#9440) Signed-off-by: Tailing Yuan <yuantailing@gmail.com> [None][feat] Support custom chat template for tool calling (NVIDIA#9297) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> [TRTLLM-8160][feat] Add draft token tree runtime on CDL (NVIDIA#8586) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com> [None][ci] waive a test (NVIDIA#9458) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> [https://nvbugs/5680905][fix] Relax the MMLU accuracy requirement for DS-v3.2 (NVIDIA#9439) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> [TRTLLM-8376][feat] top-p optimization (removes redundant softmax) (NVIDIA#9411) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com> [TRTLLM-9490][feat] use FlashInfer's top_k_sampling_from_probs (NVIDIA#9457) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com> [https://nvbugs/5647400] [fix] Enlarged the AllReduce workspace size to 64MB. Added AllReduce strategy to AD config. (NVIDIA#9145) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com> [TRTLLM-909][feat] Overlap context chunks in pipeline parallel mode (NVIDIA#9308) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> [None][chore] AutoDeploy add multi stream moe pass to default.yaml (NVIDIA#9430) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> [https://nvbugs/5685143][fix] avoid cudaFree overlap with cuda graph (NVIDIA#9438) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> [None][chore] Bump version to 1.2.0rc5 (NVIDIA#9455) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> [TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (NVIDIA#9356) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> [None][ci] move some slow test cases of DGX-B200 to post merge (NVIDIA#9467) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> [TRTLLM-9293][feat] Enable partial weight loading to support streaming update weights (NVIDIA#9224) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com> [None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com> [TRTLLM-9264][fix] Add accuracy/unit tests/doc for phi4mm (NVIDIA#9246) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> [https://nvbugs/5580099][fix] Cherry pick IMA issue fix from release/1.1 (NVIDIA#9032) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> [None][chore] Upgrade CuteDSL to 4.3.0 (NVIDIA#9444) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> [None][feat] Support MLA chunked prefill for DeepSeek V3.2 model (NVIDIA#9376) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> [None][feat] Add environment variable to force spec-dec number of accepted tokens (NVIDIA#9371) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com> [None][infra] Update allowed list 2025.11.25 (NVIDIA#9468) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> [None][infra] Fail the pipeline when slurm ssh dropped (NVIDIA#9157) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> [None][feat] AutoDeploy: Remove redundant copies in mamba layers (NVIDIA#9461) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> [None][feat] AutoDeploy: Add A_log fusion for Mamba layers (NVIDIA#9422) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> [None][ci] Waive blackwell test on spec gate. (NVIDIA#9502) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com> [https://nvbugs/5608930][fix] Fix a typo (NVIDIA#9487) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> [NVIDIA#9463][feat] Add revision option to trtllm commands (NVIDIA#9498) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com> [TRTLLM-9085][doc] fix math formula rendering issues (NVIDIA#9481) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> [None][chore] update comments in llm_args.py (NVIDIA#9472) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> [None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com> [https://nvbugs/5680310][fix] Fix ctx only timed out test (NVIDIA#9410) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> [https://nvbugs/5547414][fix] enable case after using local cache model (NVIDIA#9473) Signed-off-by: Hui Gao <huig@nvidia.com> [None][fix] Replace PYTORCH_CUDA_ALLOC_CONF with PYTORCH_ALLOC_CONF to fix deprecation warning (NVIDIA#9294) Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> [https://nvbugs/5698581][fix] Init draft tokens for CUDA graph dummy request (NVIDIA#9505) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com> [None][infra] Waive failed case in pre-merge on 11/27 (NVIDIA#9507) Signed-off-by: qqiao <qqiao@nvidia.com> [TRTLLM-9513][docs] Qwen3 deployment guide (NVIDIA#9488) Signed-off-by: Lanyu Liao <laliao@laliao-mlt.client.nvidia.com> Co-authored-by: Lanyu Liao <laliao@laliao-mlt.client.nvidia.com> [None][chore] revert batch_size=1 to prevent timeout and lower accuracy reference by 0.12% as a WAR (NVIDIA#9447) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com> [TRTLLM-9279][infra] Use flexcache for gh200 nodes since they locate in Austin (NVIDIA#9405) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> [cherry-pick][https://nvbugs/5670793][fix] Solve trtllm-serve launch_disaggregated issue (NVIDIA#9346) Signed-off-by: xxi <xxi@nvidia.com> [None][infra] Fix Slurm job script (NVIDIA#9508) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> [None][fix] change allreduce workspace dtype to torch.int64 to avoid overflow (NVIDIA#9479) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com> [None][feat] add qwen3-next CI test of accuracy on BF16 and NVFP4 (NVIDIA#9330) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com> [None][fix] fix TP support for DeepSeek-V3.2 on hopper (NVIDIA#9484) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> [TRTLLM-9389][chore] Refactor AlltoallMethodType. (NVIDIA#9388) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> [https://nvbugs/5674665][chore] Add test coverage for https://nvbugspro.nvidia.com/bug/5674665 (NVIDIA#9518) Signed-off-by: eopXD <yuehtingc@nvidia.com> [TRTLLM-7288][infra] Download merged waive list in slurm script (NVIDIA#8999) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> [https://nvbugs/5687820][fix] Remove self.abort() in DetokenizedGenerationResult (NVIDIA#9449) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> [NVIDIA#9150][feat] AutoDeploy Nemotron-Flash support (NVIDIA#9504) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> [None] [chore] Update to cutlass 4.3 (NVIDIA#8637) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> [https://nvbugs/5637037][chore] Update waive lists. (NVIDIA#9386) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> [None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com> [TRTLLM-8970][infra] Fix generate report when has isolation test result (NVIDIA#8861) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> [https://nvbugs/5685015][fix] Update invalid max_token test (NVIDIA#9435) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> [None][fix] Fix on-disk cache and revise logger/statistics for AutoTuner. (NVIDIA#9211) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> [https://nvbugs/5689658][test] Fix gpu lock issue running on cluster (NVIDIA#9441) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com> [None][chore] add spec_decoding configs in perf benchmark scripts and fix typos (NVIDIA#9533) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com> [None][fix] Remove FP8 K/V buffer from TRTLLM sparse MLA attention kernel (NVIDIA#9529) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> [None] [chore] Enhancements and clean up to slurm scripts (NVIDIA#9493) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> [None][chore] Revert "[None][fix] change allreduce workspace dtype to torch.int64 t… (NVIDIA#9538) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com> [None][infra] Waive failed cases for main branch on 11/28 (NVIDIA#9539) Signed-off-by: qqiao <qqiao@nvidia.com> [None][fix] Pass checkpoint_format to create_input_processor (NVIDIA#9521) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> [TRTLLM-9541][infra] Use artifactory mirror for download.pytorch.org (NVIDIA#9477) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> [TRTLLM-9488][feat] add 'disable_flashinfer_sampling' config option (NVIDIA#9454) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com> [None][infra] Waive failed case in pre-merge on 11/28 (NVIDIA#9537) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> [None][perf] Helix: improve all-to-all perf for large CP size (NVIDIA#9494) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com> Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com> Co-authored-by: Zheyu Fu <zheyuf@nvidia.com> [None][feat] support for more accurate AR calculation (NVIDIA#9323) Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com> [TRTLLM-9488][fix] llmapi references (NVIDIA#9547) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com> [NVIDIA#8948][feat] Support custom sharding config (NVIDIA#9143) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> [None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com> [None][chore] Weekly mass integration of release/1.1 -- rebase (NVIDIA#9522) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: qgai <qgai@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Vincent Zhang <vinczhang@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Signed-off-by: leslie-fang25 <leslief@nvidia.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com> Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Vincent Zhang <vcheungyi@163.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Co-authored-by: Leslie Fang <leslief@nvidia.com> Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com> [TRTLLM-5971][feat] Integrate helix parallelism (NVIDIA#9342) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> [None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com> [None][infra] - Request idle time exemption for OCI jobs (NVIDIA#9528) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> [None][infra] Wiave failed tests for main branch on 11/30 (NVIDIA#9555) Signed-off-by: qqiao <qqiao@nvidia.com> [None][fix] Fix port conflict in disagg tests (NVIDIA#9474) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> [None][ci] Split H100_PCIe-PyTorch-Post-Merge test stage (NVIDIA#9558) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> [None][ci] Split H100_PCIe-PyTorch-Post-Merge test stage (NVIDIA#9559) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> [TRTLLM-8958][feat] and [TRTLLM-8960]: create ConfigurableMoE and support TRTLLMGenFusedMoE as backend (NVIDIA#9486) [None] [feat] Optimize the algorithm part of RocketKV (NVIDIA#9333) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com> [https://nvbugs/5690172][fix] Fix Qwen3-235B ATP accuracy issue with PDL (NVIDIA#9530) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> [TRTLLM-6222][feat] Extend cute_dsl_nvfp4_gemm to sm103. (NVIDIA#9543) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com> [None][fix] Correct virtual memory allocation alignment (NVIDIA#9491) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> [None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com> [https://nvbugs/5684703][fix] Unwaive disagg guided decoding test (NVIDIA#9466) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> [https://nvbugs/5503479][fix] Temporarily lower reference accuracy to stabilize CI (NVIDIA#9398) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> [None][chore] remove qwen3-next accuracy tests (NVIDIA#9534) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com> [None][doc] fix mtp.py typo (NVIDIA#9307) Signed-off-by: liugaoji <757394026@qq.com> [None][feat] add chat template kwargs support to longbench-v2 (NVIDIA#9544) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> [NVIDIA#9496][fix] AutoDeploy: remove auto-tuner from nvfp4_gemm forward (NVIDIA#9497) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> [None][fix] Replace hash method with unique_id for cutedsl MoE runners. (NVIDIA#9569) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> [None][chore] refactor disaggregated scripts to use named arguments (NVIDIA#9581) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com> [TRTLLM-6222][feat] Several perf opt for cuteDSL nvf4 gemm (NVIDIA#9428) Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com> [None][chore] reduce the layers of the `devel` docker image (NVIDIA#9077) Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com> [https://nvbugs/5651854][infra] Enable perf metrics during accuracy testing (NVIDIA#9140) [None][fix] Skip Allreduce init for Attention DP (NVIDIA#9542) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> [None][test] [None][test] Waive main branch test failures 12/1 (NVIDIA#9566) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> [None][ci] Minor change for Slurm scripts (NVIDIA#9561) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> [TRTLLM-6768][infra] Fix params for not updating github status (NVIDIA#6747) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> [None][infra] Update the pytest options after MI (NVIDIA#9579) Signed-off-by: qqiao <qqiao@nvidia.com> [TRTLLM-6756][feat] Add Beam Search to TorchSampler (NVIDIA#8509) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> [None][chore] Defer exposing context parallel configs (NVIDIA#9552) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> [TRTC-1943][feat] Env vars override support in LLM API (NVIDIA#9104) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> [None][feat] AutoDeploy: Use the router gemm op for nemotron MOE (NVIDIA#9500) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> [NVIDIA#9198][feat] Refactor dist ops in AutoDeploy (NVIDIA#9301) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com> [None][fix] Prevent YAML partial kv_cache_config from incorrectly overriding the complete kv_cache_config (NVIDIA#9262) Signed-off-by: Yuening Li <62227368+Yuening-wa@users.noreply.github.com> [TRTLLM-9085][doc] fix math formula rendering issues in github (NVIDIA#9605) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> [None][feat] Unify nvfp4 gemm backend (NVIDIA#8963) Signed-off-by: Shijie Wang <jaywan@nvidia.com> Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Shijie <jaywan@nvidia.com> Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com> [None][feat] Add support for KVCache reuse for DSv32 (NVIDIA#9383) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> [None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com> [None][chroe] Polish qwen3-next modeling code. (NVIDIA#8902) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> [https://nvbugs/5703953][fix] Use random port for disagg tests (NVIDIA#9582) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> [None][fix] Waive gb200 (NVIDIA#9580) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> [FMDL-1328][feat] Add support for nano-v3 and super-v3 with pytorch backend (NVIDIA#9261) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> [https://nvbugs/5582091][test] increase warmup times in testing for multi-gpu cases (NVIDIA#9578) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> [None][chore] Add failed cases into waives.txt (NVIDIA#9588) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> [https://nvbugs/5702793][fix] Fix uncontiguous tensor view (NVIDIA#9576) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com> [None][infra] Waive failed cases for main branch (NVIDIA#9615) Signed-off-by: qqiao <qqiao@nvidia.com> [TRTLLM-9488][feat] use FlashInfer.sampling by default (NVIDIA#9545) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com> [None][infra] Update allowlist 2025/12/01 (NVIDIA#9616) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> [None][infra] Remove an invalid test name in waives.txt (NVIDIA#9620) Signed-off-by: qqiao <qqiao@nvidia.com> Lock the gpu clocks in L0 perf tests (NVIDIA#9585) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com> [TRTLLM-9466][test] Evaluate helix parallelism with DSV3 Lite (NVIDIA#9597) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> [None][fix] Extract GPU count from single-node stage names (NVIDIA#9599) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> [https://nvbugs/5667774][fix] Refine Piecewise Cuda Graph Condition for DP (NVIDIA#9393) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> [TRTLLM-9144][fix] enhance RPC robustness (NVIDIA#8711) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com> [https://nvbugs/5627710][fix] Fix synchronization bugs in KvCacheTransferManager that can cause corrupted blocks (NVIDIA#9056) Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> [TRTLLM-8980][test] Clean up spec dec tests in test_llm_api_pytorch (NVIDIA#8889) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> [NVIDIA#9150][feat] Add code for nano v3 to custom implementation in AD (NVIDIA#9465) * Why? We would like to show an alternative to monkey-patching in AutoDeploy. * What? This commit builds on the existing custom model implementation for NemotronH and adds the bits relevant for MoE layers. Part of NVIDIA#9150. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> [NVIDIA#9150][feat] AutoDeploy: reviewer comments for NVIDIA#9150 (NVIDIA#9527) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> [https://nvbugs/5651854][fix] Fix dist-serving perf by clearing CPU affinity (NVIDIA#9549) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> [NVIDIA#9550][feat] AutoDeploy: Add NVFP4 Cutlass MoE kernels (NVIDIA#9551) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> [https://nvbugs/5688388][fix] fix: Reducing num request in disagg test to speed up (NVIDIA#9598) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> [TRTLLM-8946][feat] Improved heuristics to detect shardable regions (NVIDIA#9200) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> [NVIDIA#9632][feat] Support EXTRA_WHEEL_BUILD_ARGS during wheel build (NVIDIA#9633) Signed-off-by: Yu Chi Li <yuchil@nvidia.com> [None][chore] Waive test failing on pre-merge (NVIDIA#9638) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> [None][chore] Remove traceback dump for multimodal input processor (NVIDIA#9634) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> [None][chore] Fix trtllm-eval and move GroupedGemmInputsHelper (NVIDIA#9612) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> [https://nvbugs/5698434][fix] Use separate weight mapper for draft (NVIDIA#9607) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com> [TRTLLM-7101][infra] Reuse passed tests (NVIDIA#6894) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> [None][test] Remove duplicate test cases (NVIDIA#9623) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com> [None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com> [None][feat] Add RocketKV usage doc and e2e accuracy test on LongBenchV2 (NVIDIA#9572) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com> [TRTLLM-9242][doc] Add examples showcasing openai compatible APIs (NVIDIA#9520) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> [None][chore] AutoDeploy update cuda stream manager for multi-device (NVIDIA#9575) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> [TRTLLM-9391][chore] Automatically estimate required workspace. (NVIDIA#9535) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> [https://nvbugs/5708475][fix] Fix e2e eval accuracy for helix parallelism (NVIDIA#9647) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> [https://nvbugs/5561153][test] Fix log error for perf test (NVIDIA#9622) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> [TRTLLM-8241][feat] Aliasing to comply to LlmArgs (NVIDIA#9586) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> [None][chore] Add failed cases into waives.txt (NVIDIA#9593) Signed-off-by: Jie Li <lijie@nvidia.com> Co-authored-by: Jie Li <lijie@nvidia.com> [TRTLLM-6842][feat] Support Response API for general purpose (NVIDIA#9392) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> [None][test] Update Qwen3-next accuracy testing by setting the cuda … (NVIDIA#9613) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> [None][feat] update trtllm-gen nvfp4 kernels with better performance (NVIDIA#9510) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> [None][doc] Replace the tensorrt icon with torch icon on overview.md (NVIDIA#9644) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> [https://nvbugs/5705197][chore] Unwaive timeout disagg tests (NVIDIA#9637) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> [https://nvbugs/5552132][fix] Enable LoRa for GPT OSS Torch (NVIDIA#8253) Signed-off-by: Michal Guzek <mguzek@nvidia.com> [None][fix] Fix wide ep MoE error (NVIDIA#9642) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> [https://nvbugs/5702795][fix] Remove the warning message for aten.log. (NVIDIA#9665) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> [https://nvbugs/5693853][fix] Fix error handling when querying machin… (NVIDIA#9483) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> [OMNIML-2932] [feat] nvfp4 awq support (NVIDIA#8698) Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com> [NVIDIA#9643][fix] AutoDeploy: fix nano sharding config (NVIDIA#9668) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> [NVIDIA#9147][feat] AutoDeploy: Draft Target Speculative Decoding (NVIDIA#9275) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com> [None][feat] Update Qwen3CodeToolParser to align tool-calling parameters (NVIDIA#9540) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> [TRTLLM-7181][infra] Generate test results when pytest timeout happens (NVIDIA#9396) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> [None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com> [TRTLLM-9522][fix] restore `trtllm-serve mm_embedding_serve` (NVIDIA#9669) [TRTLLM-5093][infra] Write env variables to a file in the interactive debug session (NVIDIA#6792) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> [None][fix] fix error when processing batches containing both text and mm data (NVIDIA#8381) Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn> [TRTLLM-7073][feat] Support torch compile for PP for Llama and DeepSeekV3 (NVIDIA#7838) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> [None][feat] Add weights initialization and context phase parser to layer-wise benchmarks (NVIDIA#9667) Signed-off-by: Tailing Yuan <yuantailing@gmail.com> [TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint (NVIDIA#9057) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> [NVIDIA#8733][feat] Add Llama4 MoE handling to AutoDeploy (NVIDIA#9556) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com> Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com> Co-authored-by: Neta Zmora <nzmora@nvidia.com> [None][ci] unwaive tests (NVIDIA#9651) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> [None][feat] Add NIXL-LIBFABRIC support (NVIDIA#9225) Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com> Signed-off-by: zackyoray <yorayz@nvidia.com> [None][test] rename wide ep and disagg metric name in perf test (NVIDIA#9704) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> [https://nvbugs/5467531][fix] Unwaive fused_moe all to all test with … (NVIDIA#9617) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> [None][fix] Recover TRTLLM MoE Perf for DEP (NVIDIA#9562) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com> [None][chore] Add failed cases into waives.txt (NVIDIA#9662) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> [None][fix] Fix TLLM_SPEC_DECODE_FORCE_NUM_ACCEPTED_TOKENS for MTP/EAGLE (NVIDIA#9608) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com> [None][infra] Add container notices and documentation (NVIDIA#9185) Signed-off-by: Parker Drake <pdrake@nvidia.com> [TRTLLM-5312][infra] Add triton trigger rules (NVIDIA#6440) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> [None][doc] Add feature docs for helix parallelism (NVIDIA#9684) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> [TRTLLM-9579][infra] Set mergeWaiveList stage UNSTABLE when there is any issue (NVIDIA#9692) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> [None][doc] Added line about partial reuse (NVIDIA#7846) Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> [TRTLLM-8920][feat] decouple disagg service from fastapi (NVIDIA#8714) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> [https://nvbugs/5633340][fix] start disagg workers and servers on free ports (NVIDIA#9694) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> [TRTLLM-9562] [doc] Add Deployment Guide for Kimi K2 Thinking on TensorRT LLM - Blackwell (NVIDIA#9711) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> [NVIDIA#9602][feat] AutoDeploy: Support TRTLLM Sampler (NVIDIA#9641) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com> [None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com> [None] [tests] Unwaive EPLB tests (NVIDIA#9625) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> [https://nvbugs/5518713][test] Refactor core test lists by merging with llm_perf_cluster.yml (NVIDIA#9714) Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com> [TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders (NVIDIA#9583) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> [None][refactor] Improve request processing function in sampler (NVIDIA#9671) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> [https://nvbugs/5670672][fix] Fix flaky KV connector tests (NVIDIA#9676) Signed-off-by: jthomson04 <jwillthomson19@gmail.com> [None][infra] Update allowed list 20251204 (NVIDIA#9718) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> [None][feat] AutoDeploy: Perf optimization for Attention and rmsnorm (NVIDIA#9719) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> [None][chore] Waive flakey disagg tests (NVIDIA#9749) Signed-off-by: Mike Iovine <miovine@nvidia.com> [https://nvbugs/5601682][fix] Fix cacheTransceiver hang (NVIDIA#9311) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> [TRTLLM-9199][docs] KV Connector Docs (NVIDIA#9325) Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> [TRTLLM-9160][doc] add doc to llm_runtime.py (NVIDIA#9482) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> [None][doc] VDR 1.0 trtllm-serve doc enhancement (NVIDIA#9443) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> [TRTLLM-9086][doc] Clean up TODOs in documentation (NVIDIA#9292) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> [TRTLLM-9157][doc] Guided decoding doc improvement (NVIDIA#9359) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> [None][infra] Updated Linux installation guide (NVIDIA#9485) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> [TRTLLM-9075][doc] refine the slurm examples (NVIDIA#9548) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> [TRTLLM-9093][doc] update hyper links in overview (NVIDIA#9568) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> [TRTLLM-9092][doc] link to modelopt checkpoints in quick start guide (NVIDIA#9571) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> [None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com> [None][fix] Fix triton moe load_weight (NVIDIA#9649) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com> [None][fix] fix a bug: deepseek_fp8_block_scales in TRTLLMGEN-MoE use 2D x_sf instead of 1D (NVIDIA#9658) Signed-off-by: xxi <xxi@nvidia.com> [TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP (NVIDIA#9592) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> [TRTLLM-9522][chore] implement default `attach_multimodal_embeddings` (NVIDIA#9664) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com> [TRTLLM-9660][feat] Convert cuteDSL GEMM to opt-in feature (NVIDIA#9682) Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> [None][fix] enable hmac in RPC (NVIDIA#9745) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> [None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com> [https://nvbugs/5703953][fix] Preserving ip:port for trtllm-serve before initializing llm (NVIDIA#9646) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> [None][infra] Waive failed cases for main branch on 12/07 (NVIDIA#9769) Signed-off-by: qqiao <qqiao@nvidia.com> [None][fix] Several minor fixes to CI setting (NVIDIA#9765) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> [OMNIML-3036][doc] Re-branding TensorRT-Model-Optimizer as Nvidia Model-Optimizer (NVIDIA#9679) Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> [None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce (NVIDIA#9314) Signed-off-by: Ludwig Schneider <lschneider@nvidia.com> [TRTLLM-9000][feat] Add multi-node Perf Tests into CI (NVIDIA#8800) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> [None][test] add ntp tolerance in time metrics verification (NVIDIA#9741) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com> [TRTLLM-9603][feat] Enable ConfigurableMoE test in the CI (NVIDIA#9645) [https://nvbugs/5422621][test] Add GB 200 WIDEEP test case for RCCA 5422621 (NVIDIA#9506) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> [None][fix] Fix two tuning cache miss issues. (NVIDIA#9743) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> [None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com> [TRTLLM-9706] [doc] Update wide EP documents (NVIDIA#9724) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> [https://nvbugs/5666804][test] only adding sampler config for limited models (NVIDIA#9512) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com> [None][infra] Waive failed cases for main on 12/08 (NVIDIA#9773) Signed-off-by: qqiao <qqiao@nvidia.com> [None][chore] Move the rocketkv e2e test to post-merge (NVIDIA#9768) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> [None][chore] Enable tvm_ffi for cute dsl nvfp4_gemm to reduce host overhead. (NVIDIA#9690) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com> [TRTLLM-9431][perf] Enable multistream for Linear Attention in Qwen3-… (NVIDIA#9696) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> [None][chore] Remove closed bugs (NVIDIA#9770) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> [None][infra] update mooncake in docker images (NVIDIA#9584) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com> Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> [None][test] Add Kimi k2 WIDEEP perf and accuracy cases (NVIDIA#9686) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> [https://nvbugs/5527655][test] Add test case for RCCA 5527655 (NVIDIA#9511) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> [http://nvbugs/5649010][fix] fix test_auto_scaling.py::test_worker_restart timeout (NVIDIA#9775) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> [None][fix] Switch AutoDeploy's default allreduce strategy to NCCL (NVIDIA#9666) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com> [TRTLLM-9506][fix] Fix AR for DeepSeek-R1 2 model path (NVIDIA#9661) Signed-off-by: qgai <qgai@nvidia.com> ray + updatew works trtllm works in async env trtllm works in sync and async env ray + updatew works rebase to the updated verl server mode still cherry pick still cherry pick still cherry pick integrated http interface hang at RyExecutor create workers ray.remote clean code use tensorrt_llm.rlhf_utils Signed-off-by: Liwei Ma <liweim@nvidia.com> placement, asyncllm, and basic tests Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> connect sleep and wakeup; Add support to pass None to update_weights Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Batching ctx for IFB scheduler Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> accuracy WAR for TP>1: always use AllReduceStrategy.NCCL, refactored Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> fix e2e integration Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> update asyncllm, other nits Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> fix init setup Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Fix TRTLLMSampler logprobs perf Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> fix and cleanup Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> fix server Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Revert "Batching ctx for IFB scheduler" This reverts commit b51aac0 Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> update & address comments Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
…ases (NVIDIA#9356) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Summary by CodeRabbit
New Features
✏️ Tip: You can customize this high-level summary in your review settings.
Add disagg and wideep multi-node multi gpu test cases.
Support test list and -k filtering.
Support run all of the test cases using "submit.py" and yaml configuration files.