Skip to content

Conversation

@fredricz-20070104
Copy link
Collaborator

@fredricz-20070104 fredricz-20070104 commented Nov 21, 2025

Summary by CodeRabbit

New Features

  • Introduced disaggregated benchmarking framework with YAML-based test configurations
  • Added support for performance and accuracy testing of disaggregated models
  • Enabled configuration templates for Qwen3 and DeepSeek models across multiple backends (NIXL, UCX, WIDEEP)
  • New pytest-based test harness for automated benchmark execution and result validation

✏️ Tip: You can customize this high-level summary in your review settings.

Add disagg and wideep multi-node multi gpu test cases.
Support test list and -k filtering.
Support run all of the test cases using "submit.py" and yaml configuration files.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

📝 Walkthrough

Walkthrough

Introduces a comprehensive disaggregated inference benchmarking framework including SLURM job submission, YAML configuration management, job execution orchestration, result parsing for performance and accuracy metrics, and an extensive test harness with parametrized test configurations across multiple hardware and model scenarios.

Changes

Cohort / File(s) Summary
Configuration Generation
examples/disaggregated/slurm/benchmark/gen_worker_config.py
New script to generate YAML configuration files for context and generation workers with support for speculative decoding, MOE backends, and various parallelism strategies.
Job Submission
examples/disaggregated/slurm/benchmark/submit.sh
New Bash script to submit SLURM benchmark jobs with configurable parameters for hardware, dataset, container, and benchmarking modes.
Test Configuration Framework
tests/integration/defs/perf/disagg/utils/config_loader.py, config_validator.py, common.py
New utilities for loading YAML test configurations, validating them, extracting fields, and managing environment-based overrides with metrics and accuracy configuration support.
Job Execution & Tracking
tests/integration/defs/perf/disagg/execution/executor.py, subprocess_utils.py
tests/integration/defs/perf/disagg/utils/trackers.py, logger.py
New modules for SLURM job execution, result collection, log management, test case and session tracking, and unified logging infrastructure.
Result Parsing & Reporting
tests/integration/defs/perf/disagg/reporting/report.py, accuracy_parser.py, accuracy_types.py, accuracy_validator.py
New result parsing pipeline for performance metrics extraction, accuracy validation with hypothesis testing, and typed result containers.
Test Harness
tests/integration/defs/perf/disagg/test_disagg.py, conftest.py
New pytest-based test harness with parametrized performance and accuracy tests, session lifecycle management, and test collection filtering via external test lists.
Configuration & Documentation
tests/integration/defs/perf/disagg/README.md, test_configs/README.md, pyproject.toml, pytest.ini, tests/integration/defs/pytest.ini
Documentation and project configuration for the disaggregated benchmarking framework, including Poetry dependencies and pytest markers.
Test Configuration Files
test_configs/disagg/perf/*, test_configs/wideep/perf/*, test_configs/wideep/accuracy/*
~30 YAML configuration files defining benchmark scenarios for Qwen3-235B and DeepSeek models with various parallelism, backend, and hardware configurations (NIXL, UCX, DEFAULT, WIDEEP).
Test Lists & Utilities
testlist/all.txt, testlist/debug.txt, testlist/disagg.txt, testlist/wideep.txt
scripts/rename_configs.py, simple_collect.py
Test case lists for different test suites, plus utilities for config file renaming and system information collection.
Package Structure
execution/__init__.py, reporting/__init__.py, scripts/__init__.py, utils/__init__.py, envs/ENV.md
Package markers and directory structure for the disaggregated test framework.
Backend Comparison
tests/integration/defs/perf/disagg/compare_backends.py
New script to compare performance results between DEFAULT (NIXL) and UCX backends, computing regressions and generating HTML reports.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Runner (pytest)
    participant Config as ConfigLoader
    participant JobMgr as JobManager
    participant Executor as SlurmExecutor
    participant Parser as Result Parser
    
    Test->>Config: scan_configs()
    Config->>Config: load YAML, apply env overrides
    Config-->>Test: TestConfig (with metrics_config)
    
    Test->>JobMgr: submit_job(test_config)
    JobMgr->>Executor: write config, sbatch
    Executor-->>JobMgr: job_id
    
    JobMgr->>JobMgr: wait_for_completion(job_id)
    Note over JobMgr: poll sacct, check early failure
    JobMgr-->>Test: completion status
    
    Test->>JobMgr: check_result(job_id, test_config)
    JobMgr->>Parser: parse logs with metrics_config
    Parser-->>JobMgr: results (perf or accuracy)
    JobMgr->>JobMgr: backup_logs, cleanup
    JobMgr-->>Test: result dict
    
    Test->>Test: assert results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • Job execution flow (executor.py): Complex orchestration of SLURM submission, result collection, error handling, and result routing; interactions between SlurmRunCommandBuilder, JobManager, and external tooling need verification.
  • Configuration loading and overrides (config_loader.py, common.py): Merging of default metrics, environment-based field substitutions, and YAML I/O; ensure all override paths are correct and non-destructive.
  • Accuracy parsing with regex extraction (accuracy_parser.py): Regex pattern matching and multi-run result aggregation logic; validate correctness of parsing edge cases and dataset filtering.
  • Test parametrization and lifecycle (test_disagg.py, conftest.py): Session lifecycle, test collection filtering, and tracking state across async job execution; ensure proper cleanup and no test interference.
  • Performance metric extraction (report.py): Log parsing, metric extraction from raw results, DataFrame formatting, and CSV appending; verify metric naming, concurrency derivation, and column ordering.
  • Large number of YAML configurations: 30+ config files should be spot-checked for consistency in structure, valid field ranges, and proper environment placeholder usage.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is minimal and lacks required sections. It provides only a one-line summary without explaining the issue, solution, test coverage, or a proper checklist. Expand the description with: why these test cases are needed, how the implementation works, what test coverage exists, and completion of the PR checklist items.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding disagg and wideep multi-node multi-GPU test cases.
Docstring Coverage ✅ Passed Docstring coverage is 90.77% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (51)
tests/integration/defs/perf/disagg/pyproject.toml (2)

9-15: Standardize version constraint syntax across dependencies.

Version constraints use inconsistent syntax: explicit bounds (lines 11, 12, 15) vs. caret notation (lines 13, 14). For maintainability, prefer explicit bounds consistently.

 [tool.poetry.dependencies]
 python = ">=3.10"
 pytest = ">=8.4.2,<9.0.0"
 pandas = ">=2.3.2,<3.0.0"
-psutil = "^7.1.0"
-pyyaml = "^6.0.3"
+psutil = ">=7.1.0,<8.0.0"
+pyyaml = ">=6.0.3,<7.0.0"
 scipy = ">=1.11.0,<2.0.0"

5-5: Consider adding email to author metadata.

Poetry conventions typically include author email: authors = ["Fredric Zhu <email@example.com>"].

tests/integration/defs/perf/disagg/test_configs/README.md (1)

7-12: Minor markdown formatting issue: missing language identifier for fenced code block.

Line 7 starts a fenced code block without specifying a language identifier. Update to include the appropriate language for better rendering and linting compliance.

Apply this diff to fix the markdown linting issue:

-```
+```
 test_configs/
examples/disaggregated/slurm/benchmark/submit.sh (2)

106-124: Consider parameterizing sbatch arguments instead of hardcoding positional args.

The sbatch call passes 27 positional arguments across multiple lines (106-124), making the mapping between run_single parameters and sbatch positional arguments fragile and difficult to maintain. A typo or reordering is error-prone.

Consider refactoring to use a configuration file (YAML/JSON) or environment variables to pass parameters to the SLURM script instead of positional arguments. This would align with the new YAML-based configuration framework (referenced in the PR summary) and reduce maintenance burden.

Alternatively, document the parameter mapping clearly in a comment above run_single to aid future maintenance.


131-140: Hardcoded example configurations lack discoverability.

Lines 131-140 define 10 hardcoded benchmark configurations via run_single calls. These examples are not discoverable or filterable programmatically. If users need to run a subset (e.g., via -k filtering mentioned in the PR objective), they cannot easily do so from this script.

Consider extracting these example configurations into a separate YAML/JSON file (or leveraging the existing configuration framework under tests/integration/defs/perf/disagg/test_configs/) so that:

  • Configurations can be listed, filtered, and selected programmatically.
  • The script can load and iterate configurations from a central source.
  • Alignment with the PR's goal to "Adds support for a test list and -k filtering."

The commented-out 8k-1k variants (lines 142-156) also suggest that configuration management should be more structured.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/deepseek-r1-fp4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp3_ccb-UCX.yaml (1)

1-10: Consider adding dataset_file to metadata for consistency.

Some configuration files in this PR include dataset_file in the metadata section (e.g., the wideep variants), while others don't. For consistency across all test configurations, consider adding it here as well, since it's referenced in the benchmark section at Line 27.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb0_mtp3_ccb-UCX.yaml (2)

1-10: Consider adding dataset_file to metadata for consistency.

This configuration is missing dataset_file in the metadata section, while other configs in this PR include it. Adding it would improve consistency across test configurations.


24-24: Inconsistent concurrency_list format across configurations.

This config uses space-separated values (512 1075), while other configs use quoted strings (e.g., '2150', '1075'). Standardizing the format across all configurations would improve maintainability.

tests/integration/defs/perf/disagg/test_configs/wideep/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep16_bs64_eplb288_mtp3_ccb-UCX.yaml (1)

25-25: Inconsistent concurrency_list format across configurations.

This config uses space-separated values (512 1075), while other configs use quoted strings (e.g., '2150', '1075'). Standardizing the format across all configurations would improve maintainability.

tests/integration/defs/perf/disagg/test_configs/disagg/perf/Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-UCX.yaml (1)

1-10: Consider adding dataset_file to metadata for consistency.

This configuration is missing dataset_file in the metadata section, while other configs in this PR (particularly the wideep variants) include it. Adding it would improve consistency across test configurations.

tests/integration/defs/perf/disagg/test_configs/wideep/perf/deepseek-r1-fp4_8k1k_ctx2_gen1_dep32_bs128_eplb288_mtp3_ccb-DEFAULT.yaml (1)

58-75: Duplicate 128 entry in CUDA graph batch sizes

cuda_graph_config.batch_sizes includes 128 twice (once in the main sequence and once at the end). It’s harmless but noisy and may confuse future readers or generators.

You can simplify by dropping the duplicate:

       - 768
       - 1024
-      - 2048
-      - 128
+      - 2048
tests/integration/defs/perf/disagg/utils/config_validator.py (1)

18-33: Docstrings advertise ValueError/FileNotFoundError, but implementation only asserts

validate_test_config and the private _validate_* helpers document ValueError / FileNotFoundError in their Raises sections, but the current implementation only uses assert, which surfaces as AssertionError (and can be optimized out with -O).

Either adjust the docs to reflect AssertionError only, or convert the assertions into explicit exceptions, for example:

-        if mtp_size > 0:
-            assert gen_max_tokens == (gen_max_batch_size * (mtp_size + 1)), \
-                "config error: gen_max_tokens != gen_max_batch_size * (mtp_size + 1)"
+        if mtp_size > 0 and gen_max_tokens != gen_max_batch_size * (mtp_size + 1):
+            raise AssertionError(
+                "config error: gen_max_tokens != gen_max_batch_size * (mtp_size + 1)"
+            )

and similarly for the streaming and max‑seq‑len checks.

Also applies to: 45-84

tests/integration/defs/perf/disagg/utils/trackers.py (3)

39-54: Align get_timestamps type hints with returned values

get_timestamps() is annotated as returning Dict[str, str], but total_time__sec is a float in both branches. This can confuse callers and type-checkers.

Consider either changing the return type to something like Dict[str, object] / Dict[str, Union[str, float]], or converting the duration to a string before returning.


75-95: Guard against misconfigured OUTPUT_PATH before running session-collect

end_and_collect() assumes EnvManager.get_output_path() returns a real directory. If it’s still the default placeholder (starts with <) or otherwise invalid, log_file will point to a non-existent path and run_job will fail later with a less clear error.

It would be more robust to:

  • Check not output_path or output_path.startswith("<") and, in that case, log a clear configuration error and return False without calling run_job.
  • Optionally mirror that guard in _update_csv_timestamps() so you never try to touch a CSV in an invalid location.

This keeps failures deterministic and surfaces configuration issues earlier.


96-118: Narrow the catch-all exception in _update_csv_timestamps

Catching bare Exception around pd.read_csv / to_csv will also swallow unexpected programming errors (schema issues, type errors, etc.), and only a generic message is logged.

To improve diagnosability (and satisfy Ruff BLE001) consider:

  • Catching only expected I/O-related exceptions (OSError, pd.errors.EmptyDataError, etc.), and
  • Letting truly unexpected exceptions bubble up, or at least logging them with full context (CSV path, repr(e)).

This keeps the tracker resilient while avoiding over-broad error masking.

tests/integration/defs/perf/disagg/utils/logger.py (1)

108-136: Import-time auto-configuration is side-effectful

Importing utils.logger immediately:

  • Constructs the global logger,
  • Calls EnvManager.get_output_path(), potentially creating directories, and
  • Tries to attach a file handler inside a broad except Exception block.

That’s convenient for the disagg harness, but if this module ever gets reused as a library it can surprise callers and complicate unit tests.

If you anticipate reuse, consider moving the EnvManager/OUTPUT_PATH detection behind an explicit initializer (e.g., init_logging() called from the test entrypoint) and keeping this module focused on logger construction. You could also narrow the bottom-level except Exception to expected import/path errors to align with Ruff BLE001.

tests/integration/defs/perf/disagg/reporting/accuracy_parser.py (1)

49-60: Improve log-file read error handling and exception scope

parse_and_validate() wraps the log-file read in a bare except Exception and encodes the message into the returned result, but doesn’t emit a structured log entry. This both triggers Ruff BLE001 and makes debugging harder.

You can keep the external behavior while tightening things up by:

  • Catching specific I/O/decoding issues (OSError, UnicodeDecodeError, etc.), and
  • Logging the failure via logger.error(...) before returning the success=False result.

That way callers still get a clean AccuracyValidationResult, and you gain better observability.

tests/integration/defs/perf/disagg/simple_collect.py (3)

2-11: Docstring is out of sync with actual outputs

The top-level docstring lists four generated files (CSV + three .txt files), but the script also writes trtllm_version.txt and includes it in the summary.

Recommend updating the docstring to mention trtllm_version.txt so users know to expect that artifact.


37-41: Avoid fully silent except Exception: pass patterns for diagnostics

In collect_system_info() and TextWriter, several blocks use broad except Exception with either no logging or just pass. For a diagnostics script this keeps things resilient, but it also:

  • Hides unexpected bugs (regex issues, parsing mistakes, I/O problems), and
  • Leaves only “unknown” values with no indication of why.

A more informative pattern would be to:

  • Catch narrower, expected failure modes (FileNotFoundError, subprocess.CalledProcessError, re.error, etc.), and/or
  • Log a short DEBUG/INFO message when you fall back to "unknown".

That preserves robustness while improving debuggability and should also address Ruff’s BLE001/S110 concerns in these regions.

Also applies to: 77-82, 97-122, 209-210, 229-230


238-293: Consider reducing or gating the TensorRT-LLM retry delay

write_trtllm_version() can take up to ~70 seconds if tensorrt_llm isn’t importable or hangs: two subprocess.run(..., timeout=30) calls plus a hard-coded 10-second sleep between them. For environments where TRT-LLM isn’t installed, that’s a lot of latency for a non-essential text file.

You might want to either:

  • Lower the timeouts/sleep, or
  • Make the second attempt conditional on an env flag (e.g., DISAGG_RETRY_TRTLLM=1), so default runs fail fast but more patient retries are opt-in.

Behavior stays unchanged when the flag is set, while speeding up misconfigured or TRT-LLM-less runs.

examples/disaggregated/slurm/benchmark/gen_worker_config.py (2)

25-49: Docstring is out of sync with the actual function signature

The gen_config_file docstring still documents parameters like config_path, model_path, num_ctx_servers, worker_start_port, and server_port that are not in the current signature, and it doesn’t describe several real parameters (work_dir, ctx_free_gpu_memory_fraction, gen_gpu_memory_fraction, mtp_size, cache_transceiver_max_num_tokens, etc.).

This is likely to confuse future callers trying to use this function programmatically.

Apply an update like:

 def gen_config_file(work_dir: str,
@@
-    """
-    Generate configuration YAML file for disaggregated inference.
- 
-    Args:
-        config_path: Path to save the config file
-        model_path: Path to the model
-        num_ctx_servers: Number of context servers
-        ctx_tp_size: Tensor parallel size for context servers
-        ctx_pp_size: Pipeline parallel size for context servers
-        ctx_batch_size: Batch size for context servers
-        ctx_max_num_tokens: Max number of tokens for context servers
-        ctx_max_seq_len: Max sequence length for context servers
-        ctx_free_gpu_memory_fraction: Free GPU memory fraction for context servers
-        ctx_enable_attention_dp: Enable attention DP for context servers
-        num_gen_servers: Number of generation servers
-        gen_tp_size: Tensor parallel size for generation servers
-        gen_pp_size: Pipeline parallel size for generation servers
-        gen_batch_size: Batch size for generation servers
-        gen_max_num_tokens: Max number of tokens for generation servers
-        gen_enable_attention_dp: Enable attention DP for generation servers
-        gen_gpu_memory_fraction: GPU memory fraction for generation servers
-        eplb_num_slots: Number of slots for eplb
-        worker_start_port: Start port for workers
-        server_port: Server port
-    """
+    """
+    Generate ctx/gen worker configuration YAML files for disaggregated inference.
+
+    Args:
+        work_dir: Directory where `ctx_config.yaml` and `gen_config.yaml` will be written.
+        ctx_tp_size: Tensor parallel size for context workers.
+        ctx_pp_size: Pipeline parallel size for context workers.
+        ctx_batch_size: Max batch size for context workers.
+        ctx_max_num_tokens: Max number of tokens for context workers.
+        ctx_max_seq_len: Max sequence length for context workers.
+        ctx_free_gpu_memory_fraction: Fraction of GPU memory reserved for KV cache on ctx workers.
+        ctx_enable_attention_dp: Whether to enable attention data parallel on ctx workers.
+        gen_tp_size: Tensor parallel size for gen workers.
+        gen_pp_size: Pipeline parallel size for gen workers.
+        gen_batch_size: Max batch size for gen workers.
+        gen_max_num_tokens: Max number of tokens for gen workers.
+        gen_max_seq_len: Max sequence length for gen workers.
+        gen_enable_attention_dp: Whether to enable attention data parallel on gen workers.
+        gen_gpu_memory_fraction: Fraction of GPU memory reserved for KV cache on gen workers.
+        eplb_num_slots: Number of MOE load balancer slots (0 disables load balancer config).
+        mtp_size: Number of next‑N predict layers for MTP speculative decoding (0 disables).
+        cache_transceiver_max_num_tokens: Max tokens in cache transceiver buffer for both ctx/gen.
+    """

124-137: Clarify moe_config.load_balancer semantics vs test config expectations

Here gen_config['moe_config']['load_balancer'] is set to the path of moe_load_balancer.yaml:

moe_load_balancer_file = os.path.join(work_dir, "moe_load_balancer.yaml")
...
gen_config['moe_config']['load_balancer'] = moe_load_balancer_file

In contrast, the disagg test configs (e.g. tests/integration/defs/perf/disagg/test_configs/...) store load_balancer as an inline dict, and extract_config_fields in tests/integration/defs/perf/disagg/utils/common.py assumes:

eplb_slots = (
    config_data["worker_config"]["gen"]
    .get("moe_config", {})
    .get("load_balancer", {})
    .get("num_slots", 0)
)

So:

  • For worker configs produced here, a string path is probably what the runtime expects.
  • For test configs, a nested dict is expected for offline analysis.

To avoid accidental reuse of worker configs where a dict is required, consider:

  • Documenting clearly that this script outputs worker‑side configs, not the top‑level disagg test YAML shape; or
  • Using a different key name (e.g. load_balancer_file) at worker level if the runtime allows.

Please confirm that no code paths attempt to run extract_config_fields or similar dict‑based access on the YAMLs generated by this script.

tests/integration/defs/perf/disagg/compare_backends.py (2)

70-87: Slight cleanup: unused base_case from groupby key

In:

for (base_case, metric_type), group in grouped:
    ...

base_case is never used inside the loop. This is harmless but flagged by Ruff (B007) and slightly noisy.

You can either:

  • Rename it to _base_case to acknowledge it’s intentionally unused, or
  • If you don’t plan to log it, unpack only metric_type:
for (_, metric_type), group in grouped:
    ...

145-343: HTML template: replace fullwidth parentheses to satisfy linters and avoid ambiguity

The HTML template line:

<li><strong>Fail</strong>: DEFAULT is slower than UCX{threshold}%(Performance degradation)</li>

uses fullwidth parentheses and , which Ruff flags as ambiguous (RUF001) and can cause subtle issues in some environments.

Change them to standard ASCII parentheses:

-                <li>❌ <strong>Fail</strong>: DEFAULT is slower than UCX{threshold}%(Performance degradation)</li>
+                <li>❌ <strong>Fail</strong>: DEFAULT is slower than UCX{threshold}% (Performance degradation)</li>

No behavior change, just clearer source text and cleaner lint output.

tests/integration/defs/perf/disagg/reporting/report.py (4)

16-28: Avoid catching bare Exception unless you intend to swallow all errors

LogWriter.print_to_console already handles FileNotFoundError and PermissionError, then falls back to:

except Exception as e:
    logger.error(f"Error reading file: {e}")

Catching all exceptions without re‑raising can hide programming errors (e.g., encoding issues, interrupted system calls) and makes debugging harder.

Either:

  • Narrow the catch to specific expected exceptions, or
  • Log and re‑raise for unexpected ones:
except OSError as e:
    logger.error(f"Error reading file: {log_file_name}: {e}")
    raise

or, if you truly want to swallow everything, add a brief comment documenting that intention.


135-139: Make GPU type lookup robust to unexpected GPU_TYPE values

Currently:

gpu_type = EnvManager.get_gpu_type()
gpu_config = GPU_RESOURCE_CONFIG[gpu_type]
lock_freq_graphics = gpu_config.get("lock_freq_graphics_mhz", 0) or 0
lock_freq_memory = gpu_config.get("lock_freq_memory_mhz", 0) or 0

If GPU_TYPE is set to a value not present in GPU_RESOURCE_CONFIG, this will raise KeyError and break log parsing, even though the lock frequencies are only used as metadata.

Consider a safe default:

gpu_type = EnvManager.get_gpu_type()
gpu_config = GPU_RESOURCE_CONFIG.get(gpu_type, {})
lock_freq_graphics = gpu_config.get("lock_freq_graphics_mhz", 0) or 0
lock_freq_memory = gpu_config.get("lock_freq_memory_mhz", 0) or 0

Optionally also log a warning when the GPU type is unknown.


219-239: _get_network_name behavior doesn’t align with documented input format

Docstring says _get_network_name expects something like:

test_disagg_simple.py::TestDisaggBenchmark::test_benchmark[deepseek-r1_1k1k_...]-con-1

and extracts deepseek-r1_1k1k_...-con-1 via:

pattern = r"\[([^\]]+)\](-con-\d+)"

But _convert_to_perf_result_format currently calls it with:

base_test_name = f"{test_prefix}_con:{concurrency}"
network_name = self._get_network_name(base_test_name)

i.e., no [ ... ] section and "_con:{concurrency}" instead of "-con-1", so the regex will usually fail and you always fall back to base_test_name.replace("/", "-").

This is likely either:

  • A stale docstring, or
  • A missed update where base_test_name should still carry the original pytest-style name.

Decide which representation you want and make them consistent. For example, if you want the short deepseek-r1_...-con-1 form for readability:

  • Ensure test_prefix is the full pytest id with [...] section, and
  • Change the suffix construction to -con-{concurrency} to match the regex,

or, if the current test_prefix is already what you want, update the regex/docstring and simplify _get_network_name accordingly.


247-272: ResultSaver docstring no longer matches behavior

Docstring:

"""All of the benchmarks append to the same csv, add header to it each time.

No matter whether the columns are of the same count.
"""

Implementation:

file_exists = os.path.exists(self.output_path) and os.path.getsize(self.output_path) > 0
if file_exists:
    df.to_csv(..., header=False)
else:
    df.to_csv(..., header=True)

So the header is only written on the first write, which is the correct behavior for a unified CSV but contradicts the “add header to it each time” wording.

Update the docstring to match the actual semantics, e.g., “append to the same CSV; write header only on first write.”

tests/integration/defs/perf/disagg/utils/common.py (2)

99-121: Consider skipping placeholder paths when building container_mount

get_container_mount uses several EnvManager getters whose defaults are placeholder strings:

work_dir = EnvManager.get_work_dir()        # "<Your working directory>"
script_dir = EnvManager.get_script_dir()    # "<Your benchmark script directory>"
model_dir = EnvManager.get_model_dir()      # "<Your model and dataset directory>"
output_path = EnvManager.get_output_path()  # special-cased for directory creation
repo_dir = EnvManager.get_repo_dir()        # "<Your TensorRT-LLM repository directory>"
trtllm_wheel_path = EnvManager.get_trtllm_wheel_path()
...
mounts = [
    f"{work_dir}:{work_dir}",
    f"{script_dir}:{script_dir}",
    f"{model_dir}:{model_dir}",
    f"{output_path}:{output_path}",
]
if repo_dir:
    mounts.append(f"{repo_dir}:{repo_dir}")
...

If users forget to set these env vars, you end up with mount strings containing literal placeholders (e.g. "<Your working directory>:<Your working directory>"), which will likely cause container launch failures that are harder to interpret.

You already treat placeholder specially in get_output_path; you could apply a similar guard here, e.g.:

def _is_placeholder(path: str) -> bool:
    return path.startswith("<") and path.endswith(">")

...
for path in (work_dir, script_dir, model_dir, output_path):
    if path and not _is_placeholder(path):
        mounts.append(f"{path}:{path}")
...
if repo_dir and not _is_placeholder(repo_dir):
    mounts.append(f"{repo_dir}:{repo_dir}")

This keeps default configs usable in tests while failing more clearly when required env vars are missing in real runs.


133-194: extract_config_fields is tightly coupled to current config schema; document assumptions

extract_config_fields indexes deeply into config_data with hardcoded keys:

isl = config_data["benchmark"]["input_length"]
osl = config_data["benchmark"]["output_length"]
ctx_num = config_data["hardware"]["num_ctx_servers"]
gen_num = config_data["hardware"]["num_gen_servers"]
...
gen_tp_size = config_data["worker_config"]["gen"]["tensor_parallel_size"]
gen_batch_size = config_data["worker_config"]["gen"]["max_batch_size"]
...
cache_transceiver_backend = config_data["worker_config"]["gen"]["cache_transceiver_config"]["backend"]
...

This is fine for the curated disagg perf configs you’re adding now, but:

  • Any missing key (e.g., an optional speculative_config block, or a future rename in worker_config) will raise KeyError and break tooling.
  • The function is implicitly defining the “required schema” for disagg configs, but that’s not obvious from the call sites.

At minimum, consider:

  • Documenting clearly in the docstring which fields are required and which are optional.
  • Using .get(..., default) for fields that are truly optional (e.g., speculative_config, certain backends), keeping [] indexing only for must‑have fields.

This will make future schema evolutions (new backends, new fields) less brittle while preserving strong validation where you actually depend on the field.

tests/integration/defs/perf/disagg/scripts/rename_configs.py (3)

14-25: Handle invalid/empty YAML more defensively

yaml.safe_load can return None or a non-dict if a file is empty or malformed, in which case config.get(...) will raise AttributeError with a less-informative message.

You can fail fast with a clearer error:

-    with open(yaml_path, 'r') as f:
-        config = yaml.safe_load(f)
+    with open(yaml_path, "r") as f:
+        config = yaml.safe_load(f)
+
+    if not isinstance(config, dict):
+        raise ValueError(f"YAML config must be a mapping, got {type(config).__name__} for {yaml_path}")

This keeps the script robust while still surfacing bad configs clearly.


101-103: Narrow exception handling and address minor lint issues

Both the per-file processing and the rename loop use broad except Exception, which can hide programming errors and makes debugging harder. The Ruff hints here are reasonable:

  • For parsing/loading: catch yaml.YAMLError and OSError instead of bare Exception.
  • For rename operations: catch OSError/PermissionError rather than all exceptions.

You can also drop the unused f-string at line 128:

-            print(f"\nRenaming complete!")
+            print("\nRenaming complete!")

These tweaks improve debuggability and satisfy the linter without changing behavior.

Also applies to: 123-128


1-1: Shebang vs executable bit

The file has a shebang but will typically be invoked via python rename_configs.py. Either mark it executable in the repo or drop the shebang to silence EXE001; functionally it's fine as-is.

tests/integration/defs/perf/disagg/test_disagg.py (1)

112-117: Prefer pytest.fail over assert False and preserve original tracebacks

A few places use assert False for control-flow failures and raise e inside except blocks:

  • Lines 114–116, 190–192: assert False, "..."
  • Lines 128–130, 205–207: except Exception as e: ... raise e

For tests:

  • assert False is removed under python -O and is less explicit than pytest.fail.
  • raise e discards the original traceback; raise preserves it.

A more idiomatic pytest style would be:

@@
-                    if error_msg == "timeout":
-                        assert False, f"Job execution timeout after 7200s: {job_id}"
-                    else:
-                        assert False, f"Job failed early: {error_msg} (job_id: {job_id})"
+                    if error_msg == "timeout":
+                        pytest.fail(f"Job execution timeout after 7200s: {job_id}")
+                    else:
+                        pytest.fail(f"Job failed early: {error_msg} (job_id: {job_id})")
@@
-        except Exception as e:
-            test_tracker.end_test_case()
-            raise e
+        except Exception:
+            test_tracker.end_test_case()
+            raise
@@
-                    if error_msg == "timeout":
-                        assert False, f"Accuracy test timeout after 10800s: {job_id}"
-                    else:
-                        assert False, f"Accuracy test failed early: {error_msg} (job_id: {job_id})"
+                    if error_msg == "timeout":
+                        pytest.fail(f"Accuracy test timeout after 10800s: {job_id}")
+                    else:
+                        pytest.fail(f"Accuracy test failed early: {error_msg} (job_id: {job_id})")
@@
-        except Exception as e:
-            test_tracker.end_test_case()
-            raise e
+        except Exception:
+            test_tracker.end_test_case()
+            raise

This keeps failures explicit and stack traces intact, and aligns with Ruff’s B011/TRY201 suggestions.

Also applies to: 128-130, 188-192, 205-207

tests/integration/defs/perf/disagg/reporting/accuracy_validator.py (1)

204-210: Tidy docs and type hints in validator/threshold classes

Minor polish items:

  • HypothesisTestValidator.validate’s docstring still mentions an expected_value parameter that no longer exists.
  • DatasetThreshold._get_hypothesis_params uses Dict[str, any]; any is the built-in function, not typing.Any.

Suggested tweaks:

-    def validate(self, actual_value: float) -> tuple[bool, str]:
+    def validate(self, actual_value: float) -> tuple[bool, str]:
@@
-        Args:
-            actual_value: Actual accuracy value from test
-            expected_value: Expected accuracy value (for display consistency)
+        Args:
+            actual_value: Actual accuracy value from test
@@
-    def _get_hypothesis_params(self) -> Dict[str, any]:
+    def _get_hypothesis_params(self) -> Dict[str, Any]:

You’d also need from typing import Any at the top if not already present.

Also applies to: 255-276

tests/integration/defs/perf/disagg/README.md (3)

291-576: Align README examples and file names with the current implementation

The “Core Implementation Code” sections still reference older module/file names and call patterns (e.g., config_loader.py in the current directory, test_disagg_yaml.py, disagg_executor.py, disagg_report.py, list_configs.py), but the actual code in this PR lives under:

  • tests/integration/defs/perf/disagg/utils/config_loader.py
  • tests/integration/defs/perf/disagg/test_disagg.py
  • tests/integration/defs/perf/disagg/execution/executor.py
  • (and any real list_configs tooling in this tree, if present)

This can be confusing for anyone trying to follow the README to run or extend the tests.

It would help to:

  • Update the file names and imports in the code snippets to match the actual module layout and APIs.
  • Clearly label any legacy examples as historical context if you intend to keep them.
  • Ensure the pytest invocation examples use test_disagg.py (and any new test entrypoints) rather than test_disagg_yaml.py.

That keeps the documentation in sync with the current design and avoids sending readers to non-existent modules.

Also applies to: 578-796, 1030-1058


8-13: Clarify “filename vs YAML metadata” as the single source of truth

The README currently sends mixed signals:

  • Early on (Lines 8–13) it promotes “Filename as metadata: Parse model and benchmark type from filename, no YAML metadata needed.”
  • Later (Lines 1041–1045, 1239–1247) it emphasizes “Configuration as Data” and says filenames are only for human readability, with model_name, benchmark_type, etc. taken from YAML metadata and sequence.

The implementation (e.g., TestConfig + ConfigLoader) now clearly reads model_name, benchmark_type, and GPU support from YAML content, not filenames.

To avoid confusion, consider:

  • Removing or softening the “Filename as metadata” claim.
  • Explicitly stating that filenames are purely for humans and that YAML content is the authoritative source for model_name, benchmark_type, precision, supported_gpus, etc.

This will help future config authors know where they need to put truth.

Also applies to: 1041-1045, 1239-1247


19-33: Optional: Address markdownlint warnings (code fences, emphasis as headings)

Low-priority but easy cleanups if you want markdownlint to pass:

  • Add languages to fenced code blocks, e.g. ```bash, ```yaml, ```python instead of bare ``` (e.g., the directory tree and decision tree blocks).
  • Replace emphasized lines used as headings (e.g., **Design Philosophy**) with actual heading syntax (### Design Philosophy) rather than MD036-style emphasis-as-heading.

These don’t affect readers much but will keep tooling quieter.

Also applies to: 1215-1226, 1239-1248

tests/integration/defs/perf/disagg/utils/config_loader.py (2)

443-472: Double-check environment.work_dir override source

In _apply_env_overrides, environment.work_dir is populated from EnvManager.get_script_dir():

("environment", "work_dir"): lambda: EnvManager.get_script_dir(),

Given EnvManager provides both:

  • get_script_dir() – benchmark script directory
  • get_work_dir() – working directory

and the YAML examples use environment.work_dir: <work_dir>, it seems more natural for work_dir to come from EnvManager.get_work_dir().

If the intention really is to point work_dir at the script directory inside the container, that’s fine; otherwise consider:

-            ("environment", "work_dir"): lambda: EnvManager.get_script_dir(),
+            ("environment", "work_dir"): lambda: EnvManager.get_work_dir(),

to keep naming consistent across env vars, EnvManager, and YAML.


246-248: Optional: Narrow broad exception handling in loader and writer

Two places catch bare Exception:

  • Loading configs in scan_configs (lines 246–248).
  • Writing configs in _write_config_file (lines 525–538).

Catching Exception here keeps the test harness resilient to bad YAML or I/O issues, but it also hides programming errors (e.g., KeyError, TypeError) that you might prefer to surface during development.

If you want a tighter failure mode, you could:

-                    except Exception as e:
-                        logger.warning(f"Failed to load {yaml_file}: {e}")
+                    except (yaml.YAMLError, OSError, ValueError) as e:
+                        logger.warning(f"Failed to load {yaml_file}: {e}")

and:

-        except Exception as e:
-            logger.warning(f"Failed to write config file {yaml_path}: {e}")
+        except OSError as e:
+            logger.warning(f"Failed to write config file {yaml_path}: {e}")

This still handles expected error modes without masking unrelated bugs.

Also applies to: 525-538

tests/integration/defs/perf/disagg/execution/executor.py (10)

36-79: srun prefix construction looks solid; consider validating container image

Reusing GPU_RESOURCE_CONFIG and EnvManager makes the srun prefix consistent with the rest of the Slurm tooling and looks correct. One minor improvement is to fail fast if EnvManager.get_container_image() is empty, instead of passing --container-image= to Slurm, which can cause confusing runtime errors.


81-141: Shell command construction could quote env-derived paths for robustness

build_script_command interpolates work_dir, output_path, repo_dir, and trtllm_wheel_path directly into bash -c strings. If these env vars ever contain spaces or shell metacharacters, this can break the command or open you up to accidental shell injection.

Consider using shlex.quote around these values when building the command strings, e.g.:

+from shlex import quote
 ...
-                    f"cd {work_dir} && python3 {work_dir}/simple_collect.py {output_path}",
+                    f"cd {quote(work_dir)} && python3 {quote(work_dir)}/simple_collect.py {quote(output_path)}",

and similarly for the wheel/source branches.


143-181: Tighten typing and exception handling in run_job

Two small points here:

  • The signature uses an implicit Optional: log_file: str = None. Prefer Optional[str] (or str | None in 3.10+) to satisfy type checkers and Ruff (RUF013).
  • The broad except Exception collapses timeouts, non-zero return codes, and internal errors into the same generic message. Narrowing this to subprocess.TimeoutExpired / subprocess.CalledProcessError plus a final catch-all would preserve more signal and make debugging failed jobs easier.

These are quality-of-life improvements; the overall control flow is correct.


192-258: Job submission flow and temp config lifecycle look correct; minor nits only

The submit_job implementation correctly:

  • Writes the rendered YAML to test_config.temp_config_path.
  • Invokes submit.py with a clear command.
  • Parses the Slurm job id and cleans up the temp config on failure or exception.

Two minor nits:

  • The inner import re is redundant since re is already imported at the module level.
  • If Slurm output format changes (e.g., localized message), the "Submitted batch job" string match may fail silently; logging the full output on parse failure (you already log it as Output:) is good, so behavior is acceptable.

No functional blockers here.


259-338: Backup/archival behavior is careful and defensive

backup_logs does a full copy of the result dir, appends an _ERROR suffix on failure, adds the Slurm log, and moves or falls back to copying the config. Error handling and cleanup of the temp config on backup failure are good.

Only minor consideration: shutil.copytree for large result directories can be expensive; if this becomes an issue, you might want to support a configurable toggle or symlink-based backups, but for test infra this is fine.


384-433: Avoid shadowing check_result name for clarity

The local variable in check_result:

check_result = JobManager._check_job_result(...)
...
return check_result

shadows the static method name JobManager.check_result. Not a bug, but slightly confusing when reading or debugging.

Renaming the local (e.g., result = JobManager._check_job_result(...)) would improve readability.


435-489: Early-failure checker works, but job_id is unused and inner try/except is very broad

The log scanning logic for output_gen_*.log / output_ctx_*.log and the patterns you’re matching look good.

Two small cleanups:

  • job_id isn’t used in check_for_early_failure; either log it in warnings or drop it from the signature to avoid confusion.
  • The inner try/except Exception: pass (Lines [481-483]) suppresses all errors including programming mistakes. Since you already have an outer except that logs, consider at least logging the inner failure too or restricting the exception type (e.g., OSError).

715-765: Perf result handling is correct but could surface parse failures more explicitly

_check_perf_result correctly:

  • Delegates parsing to LogParser.
  • Writes results to a single CSV via ResultSaver.
  • Marks success only when a non-None DataFrame is produced.

Two optional improvements:

  • When parse_result["status"] is False or df is None, you currently just return the default {"status": "UNKNOWN", "success": False}; if you extend LogParser.parse to return an error string, wiring that into result["error"] here would improve debuggability.
  • EnvManager.get_output_path() already conditionally creates the directory when it’s not a placeholder. If OUTPUT_PATH is left at the default placeholder, your extra os.makedirs(output_path, exist_ok=True) will now create a literal <The csv ...> directory. Mirroring the same “not a placeholder” check here would keep that behavior consistent.

Neither blocks correctness, but both would make failures easier to understand and behavior more predictable when env vars are unset.


768-823: Category routing in _check_job_result is clear; consider guarding unexpected categories

Routing between accuracy and perf checks based on test_category == "accuracy" is simple and works, with a sane default to perf for everything else.

If you expect only "perf" and "accuracy", you might want to validate test_category and log or raise on unexpected values instead of silently treating them as perf. That would catch misconfigured tests early.


178-180: Broad except Exception usage is pervasive; consider narrowing where practical

Across several places (e.g., run_job, submit_job, backup_logs, cleanup_result_dir, check_for_early_failure, check_job_status, cancel_job), you use bare except Exception: blocks. For a test harness this is sometimes acceptable, but it does mask programming errors and makes static-analysis tools unhappy.

Where feasible, prefer:

  • Specific exceptions (OSError, subprocess.TimeoutExpired, subprocess.CalledProcessError, yaml.YAMLError, etc.).
  • A final broad catch that at least logs the full stack trace for truly unexpected errors.

You don’t need to change all of them immediately, but tightening the most frequently-hit paths will improve diagnosability.

Also applies to: 247-257, 327-337, 354-356, 485-487, 507-509, 597-599

tests/integration/defs/perf/disagg/test_configs/wideep/perf/Qwen3-235B-A22B-FP4_1k1k_ctx1_gen1_dep32_bs16_eplb288_mtp3_ccb-NIXL.yaml (1)

14-18: Document required template placeholders.

This YAML configuration uses multiple unresolved placeholders (<partition>, <account>, <container_mount>, <container_image>, <model_path>, <full_path_to_work_dir>, <dataset_file>) that must be substituted before use. Add a comment block at the top of the file or in adjacent documentation describing:

  • Which placeholders are required vs. optional
  • Expected value formats for each placeholder
  • Example substitution for reference

Consider adding a header comment:

+# YAML Configuration Template for Qwen3-235B-A22B-FP4 Disaggregated Inference Test
+# 
+# Required placeholders (must be substituted before use):
+#   <partition>          - SLURM partition name (e.g., "gpu_cluster")
+#   <account>            - SLURM account/project (e.g., "ml_team")
+#   <container_mount>    - Host path for container mount (e.g., "/path/to/mount")
+#   <container_image>    - Container image URI (e.g., "docker.io/nvidia/pytorch:latest")
+#   <model_path>         - Path to model weights (e.g., "/models/Qwen3-235B-A22B-FP4")
+#   <full_path_to_work_dir> - Working directory for job (e.g., "/workspace/runs/test_20251121")
+#   <dataset_file>       - Dataset file path (e.g., "datasets/deepseek-r1-1024-1024-100000-ratio-1_for_serve.json")
+#
 metadata:
   model_name: Qwen3-235B-A22B-FP4

@fredricz-20070104
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25496 [ run ] triggered by Bot. Commit: bd27204

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25496 [ run ] completed with state SUCCESS. Commit: bd27204
/LLM/main/L0_MergeRequest_PR pipeline #19308 completed with status: 'FAILURE'

@fredricz-20070104
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25534 [ run ] triggered by Bot. Commit: 7626a71

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25534 [ run ] completed with state SUCCESS. Commit: 7626a71
/LLM/main/L0_MergeRequest_PR pipeline #19334 completed with status: 'FAILURE'

@fredricz-20070104
Copy link
Collaborator Author

/bot run

1 similar comment
@fredricz-20070104
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25569 [ run ] triggered by Bot. Commit: bd6265e

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25569 [ run ] completed with state SUCCESS. Commit: bd6265e
/LLM/main/L0_MergeRequest_PR pipeline #19366 completed with status: 'FAILURE'

@fredricz-20070104
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25612 [ run ] triggered by Bot. Commit: bd6265e

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25612 [ run ] completed with state SUCCESS. Commit: bd6265e
/LLM/main/L0_MergeRequest_PR pipeline #19406 completed with status: 'FAILURE'

@fredricz-20070104
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25664 [ run ] triggered by Bot. Commit: bd6265e

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25664 [ run ] completed with state FAILURE. Commit: bd6265e

@fredricz-20070104
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25689 [ run ] triggered by Bot. Commit: 73ce72e

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25689 [ run ] completed with state SUCCESS. Commit: 73ce72e
/LLM/main/L0_MergeRequest_PR pipeline #19472 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

…ases

Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
@fredricz-20070104
Copy link
Collaborator Author

/bot reuse last-pipeline

@github-actions
Copy link

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@fredricz-20070104
Copy link
Collaborator Author

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25803 [ reuse-pipeline ] triggered by Bot. Commit: 3d7ea06

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25803 [ reuse-pipeline ] completed with state SUCCESS. Commit: 3d7ea06
Reusing PR_Github #25689 for commit 3d7ea06

@LarryXFly LarryXFly merged commit 6a64cb4 into NVIDIA:main Nov 26, 2025
5 checks passed
MinaHuai pushed a commit to davidmlw/TensorRT-LLM that referenced this pull request Dec 10, 2025
…VIDIA#8779)

The performance results of some kernels could be easily affected by the warm/cold L2 cache status. To achieve more precise profiling results, the L2 cache is cleared for every execution by the circular buffer method for better benchmarking during autotuning.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

[None][infra] Waive failed cases for main branch on 11/25 (NVIDIA#9429)

Signed-off-by: qqiao <qqiao@nvidia.com>

[NVIDIA#8391][chore] test_perf.py to lock clocks read from gpu_configs.yml instead of max freq (NVIDIA#9409)

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

[None][ci] Move more test stages to use OCI machines (NVIDIA#9395)

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Matt Lefebvre <matthewelefebvre@gmail.com>

[None][feat] Improve TRTLLM MoE in small hidden size throughput cases (NVIDIA#9377)

Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>

[https://nvbugs/5537996][fix] Let KV cache manager block initialization be aware whether it is doing a dry run or not (NVIDIA#9093)

Before this commit, the kv cache manager does the same regardless, which causes a mis-calculation in free memory available to allocate for the KV cache manager, hence causing a crash.

This commit fixes this by letting KV cache manager initialization be aware whether it is doing the dry run or not. If it is a dry run, use the max_tokens setting that is already pre-calculated and filled into kv_cache_config.max_tokens.

Signed-off-by: eopXD <yuehtingc@nvidia.com>

[https://nvbugs/5667922][fix] Update long context evaluation config (NVIDIA#9426)

Signed-off-by: mni <125171826+baize97@users.noreply.github.com>

[None][fix] Mitigate test timeout issues (NVIDIA#9445)

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

[None][chore] Fix trtllm-eval for PyTorchLLM (NVIDIA#9427)

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

[None][feat] Add a parser to layer-wise benchmarks (NVIDIA#9440)

Signed-off-by: Tailing Yuan <yuantailing@gmail.com>

[None][feat] Support custom chat template for tool calling (NVIDIA#9297)

Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>

[TRTLLM-8160][feat] Add draft token tree runtime on CDL (NVIDIA#8586)

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

[None][ci] waive a test (NVIDIA#9458)

Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>

[https://nvbugs/5680905][fix] Relax the MMLU accuracy requirement for DS-v3.2 (NVIDIA#9439)

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

[TRTLLM-8376][feat] top-p optimization (removes redundant softmax) (NVIDIA#9411)

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

[TRTLLM-9490][feat] use FlashInfer's top_k_sampling_from_probs (NVIDIA#9457)

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

[https://nvbugs/5647400] [fix] Enlarged the AllReduce workspace size to 64MB. Added AllReduce strategy to AD config. (NVIDIA#9145)

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

[TRTLLM-909][feat] Overlap context chunks in pipeline parallel mode (NVIDIA#9308)

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

[None][chore] AutoDeploy add multi stream moe pass to default.yaml (NVIDIA#9430)

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

[https://nvbugs/5685143][fix] avoid cudaFree overlap with cuda graph (NVIDIA#9438)

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

[None][chore] Bump version to 1.2.0rc5 (NVIDIA#9455)

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (NVIDIA#9356)

Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>

[None][ci] move some slow test cases of DGX-B200 to post merge (NVIDIA#9467)

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

[TRTLLM-9293][feat] Enable partial weight loading to support streaming update weights (NVIDIA#9224)

Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>

[None][infra] Check in most recent lock file from nightly pipeline

Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>

[TRTLLM-9264][fix] Add accuracy/unit tests/doc for phi4mm (NVIDIA#9246)

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

[https://nvbugs/5580099][fix] Cherry pick IMA issue fix from release/1.1 (NVIDIA#9032)

Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>

[None][chore] Upgrade CuteDSL to 4.3.0 (NVIDIA#9444)

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

[None][feat] Support MLA chunked prefill for DeepSeek V3.2 model (NVIDIA#9376)

Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>

[None][feat] Add environment variable to force spec-dec number of accepted tokens (NVIDIA#9371)

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>

[None][infra] Update allowed list 2025.11.25 (NVIDIA#9468)

Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>

[None][infra] Fail the pipeline when slurm ssh dropped (NVIDIA#9157)

Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>

[None][feat] AutoDeploy: Remove redundant copies in mamba layers (NVIDIA#9461)

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

[None][feat] AutoDeploy: Add A_log fusion for Mamba layers (NVIDIA#9422)

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

[None][ci] Waive blackwell test on spec gate. (NVIDIA#9502)

Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>

[https://nvbugs/5608930][fix] Fix a typo (NVIDIA#9487)

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

[NVIDIA#9463][feat] Add revision option to trtllm commands (NVIDIA#9498)

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>

[TRTLLM-9085][doc] fix math formula rendering issues (NVIDIA#9481)

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

[None][chore] update comments in llm_args.py (NVIDIA#9472)

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

[None][infra] Check in most recent lock file from nightly pipeline

Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>

[https://nvbugs/5680310][fix] Fix ctx only timed out test (NVIDIA#9410)

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

[https://nvbugs/5547414][fix] enable case after using local cache model (NVIDIA#9473)

Signed-off-by: Hui Gao <huig@nvidia.com>

[None][fix] Replace PYTORCH_CUDA_ALLOC_CONF with PYTORCH_ALLOC_CONF to fix deprecation warning (NVIDIA#9294)

Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>

[https://nvbugs/5698581][fix] Init draft tokens for CUDA graph dummy request (NVIDIA#9505)

Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>

[None][infra] Waive failed case in pre-merge on 11/27 (NVIDIA#9507)

Signed-off-by: qqiao <qqiao@nvidia.com>

[TRTLLM-9513][docs] Qwen3 deployment guide (NVIDIA#9488)

Signed-off-by: Lanyu Liao <laliao@laliao-mlt.client.nvidia.com>
Co-authored-by: Lanyu Liao <laliao@laliao-mlt.client.nvidia.com>

[None][chore] revert batch_size=1 to prevent timeout and lower accuracy reference by 0.12% as a WAR (NVIDIA#9447)

Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>

[TRTLLM-9279][infra] Use flexcache for gh200 nodes since they locate in Austin (NVIDIA#9405)

Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>

[cherry-pick][https://nvbugs/5670793][fix] Solve trtllm-serve launch_disaggregated issue (NVIDIA#9346)

Signed-off-by: xxi <xxi@nvidia.com>

[None][infra] Fix Slurm job script (NVIDIA#9508)

Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>

[None][fix] change allreduce workspace dtype to torch.int64 to avoid overflow (NVIDIA#9479)

Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>

[None][feat] add qwen3-next CI test of accuracy on BF16 and NVFP4 (NVIDIA#9330)

Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>

[None][fix] fix TP support for DeepSeek-V3.2 on hopper (NVIDIA#9484)

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

[TRTLLM-9389][chore] Refactor AlltoallMethodType. (NVIDIA#9388)

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

[https://nvbugs/5674665][chore] Add test coverage for https://nvbugspro.nvidia.com/bug/5674665 (NVIDIA#9518)

Signed-off-by: eopXD <yuehtingc@nvidia.com>

[TRTLLM-7288][infra] Download merged waive list in slurm script (NVIDIA#8999)

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>

[https://nvbugs/5687820][fix] Remove self.abort() in DetokenizedGenerationResult (NVIDIA#9449)

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

[NVIDIA#9150][feat] AutoDeploy Nemotron-Flash support (NVIDIA#9504)

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

[None] [chore] Update to cutlass 4.3 (NVIDIA#8637)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

[https://nvbugs/5637037][chore] Update waive lists. (NVIDIA#9386)

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

[None][infra] Check in most recent lock file from nightly pipeline

Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>

[TRTLLM-8970][infra] Fix generate report when has isolation test result (NVIDIA#8861)

Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>

[https://nvbugs/5685015][fix] Update invalid max_token test (NVIDIA#9435)

Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>

[None][fix] Fix on-disk cache and revise logger/statistics for AutoTuner. (NVIDIA#9211)

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

[https://nvbugs/5689658][test] Fix gpu lock issue running on cluster (NVIDIA#9441)

Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>

[None][chore] add spec_decoding configs in perf benchmark scripts and fix typos (NVIDIA#9533)

Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>

[None][fix] Remove FP8 K/V buffer from TRTLLM sparse MLA attention kernel (NVIDIA#9529)

Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>

[None] [chore] Enhancements and clean up to slurm scripts (NVIDIA#9493)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

[None][chore] Revert "[None][fix] change allreduce workspace dtype to torch.int64 t… (NVIDIA#9538)

Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>

[None][infra] Waive failed cases for main branch on 11/28 (NVIDIA#9539)

Signed-off-by: qqiao <qqiao@nvidia.com>

[None][fix] Pass checkpoint_format to create_input_processor (NVIDIA#9521)

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

[TRTLLM-9541][infra] Use artifactory mirror for download.pytorch.org (NVIDIA#9477)

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>

[TRTLLM-9488][feat] add 'disable_flashinfer_sampling' config option (NVIDIA#9454)

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

[None][infra] Waive failed case in pre-merge on 11/28 (NVIDIA#9537)

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

[None][perf] Helix: improve all-to-all perf for large CP size (NVIDIA#9494)

Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
Co-authored-by: Zheyu Fu <zheyuf@nvidia.com>

[None][feat] support for more accurate AR calculation (NVIDIA#9323)

Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com>

[TRTLLM-9488][fix] llmapi references (NVIDIA#9547)

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

[NVIDIA#8948][feat] Support custom sharding config (NVIDIA#9143)

Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>

[None][infra] Check in most recent lock file from nightly pipeline

Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>

[None][chore] Weekly mass integration of release/1.1 -- rebase (NVIDIA#9522)

Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: qgai <qgai@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Vincent Zhang <vinczhang@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Vincent Zhang <vcheungyi@163.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Leslie Fang <leslief@nvidia.com>
Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>

[TRTLLM-5971][feat] Integrate helix parallelism (NVIDIA#9342)

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

[None][infra] Check in most recent lock file from nightly pipeline

Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>

[None][infra] - Request idle time exemption for OCI jobs (NVIDIA#9528)

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

[None][infra] Wiave failed tests for main branch on 11/30 (NVIDIA#9555)

Signed-off-by: qqiao <qqiao@nvidia.com>

[None][fix] Fix port conflict in disagg tests (NVIDIA#9474)

Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>

[None][ci] Split H100_PCIe-PyTorch-Post-Merge test stage (NVIDIA#9558)

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

[None][ci] Split H100_PCIe-PyTorch-Post-Merge test stage (NVIDIA#9559)

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

[TRTLLM-8958][feat] and [TRTLLM-8960]: create ConfigurableMoE and support TRTLLMGenFusedMoE as backend (NVIDIA#9486)

[None] [feat] Optimize the algorithm part of RocketKV (NVIDIA#9333)

Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>

[https://nvbugs/5690172][fix] Fix Qwen3-235B ATP accuracy issue with PDL (NVIDIA#9530)

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

[TRTLLM-6222][feat] Extend cute_dsl_nvfp4_gemm to sm103. (NVIDIA#9543)

Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>

[None][fix] Correct virtual memory allocation alignment (NVIDIA#9491)

Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>

[None][infra] Check in most recent lock file from nightly pipeline

Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>

[https://nvbugs/5684703][fix] Unwaive disagg guided decoding test (NVIDIA#9466)

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

[https://nvbugs/5503479][fix] Temporarily lower reference accuracy to stabilize CI (NVIDIA#9398)

Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>

[None][chore] remove qwen3-next accuracy tests (NVIDIA#9534)

Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>

[None][doc] fix mtp.py typo (NVIDIA#9307)

Signed-off-by: liugaoji <757394026@qq.com>

[None][feat] add chat template kwargs support to longbench-v2 (NVIDIA#9544)

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

[NVIDIA#9496][fix] AutoDeploy: remove auto-tuner from nvfp4_gemm forward (NVIDIA#9497)

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>

[None][fix] Replace hash method with unique_id for cutedsl MoE runners. (NVIDIA#9569)

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

[None][chore] refactor disaggregated scripts to use named arguments (NVIDIA#9581)

Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>

[TRTLLM-6222][feat] Several perf opt for cuteDSL nvf4 gemm (NVIDIA#9428)

Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>

[None][chore] reduce the layers of the `devel` docker image (NVIDIA#9077)

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>

[https://nvbugs/5651854][infra] Enable perf metrics during accuracy testing (NVIDIA#9140)

[None][fix] Skip Allreduce init for Attention DP (NVIDIA#9542)

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

[None][test] [None][test] Waive main branch test failures 12/1 (NVIDIA#9566)

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

[None][ci] Minor change for Slurm scripts (NVIDIA#9561)

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

[TRTLLM-6768][infra] Fix params for not updating github status (NVIDIA#6747)

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

[None][infra] Update the pytest options after MI (NVIDIA#9579)

Signed-off-by: qqiao <qqiao@nvidia.com>

[TRTLLM-6756][feat] Add Beam Search to TorchSampler (NVIDIA#8509)

Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>

[None][chore] Defer exposing context parallel configs (NVIDIA#9552)

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

[TRTC-1943][feat] Env vars override support in LLM API (NVIDIA#9104)

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

[None][feat] AutoDeploy: Use the router gemm op for nemotron MOE (NVIDIA#9500)

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

[NVIDIA#9198][feat] Refactor dist ops in AutoDeploy (NVIDIA#9301)

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

[None][fix] Prevent YAML partial kv_cache_config from incorrectly overriding the complete kv_cache_config (NVIDIA#9262)

Signed-off-by: Yuening Li <62227368+Yuening-wa@users.noreply.github.com>

[TRTLLM-9085][doc] fix math formula rendering issues in github (NVIDIA#9605)

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

[None][feat] Unify nvfp4 gemm backend (NVIDIA#8963)

Signed-off-by: Shijie Wang <jaywan@nvidia.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Shijie <jaywan@nvidia.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>

[None][feat] Add support for KVCache reuse for DSv32 (NVIDIA#9383)

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

[None][infra] Check in most recent lock file from nightly pipeline

Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>

[None][chroe] Polish qwen3-next modeling code. (NVIDIA#8902)

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

[https://nvbugs/5703953][fix] Use random port for disagg tests (NVIDIA#9582)

Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>

[None][fix] Waive gb200 (NVIDIA#9580)

Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>

[FMDL-1328][feat] Add support for nano-v3 and super-v3 with pytorch backend (NVIDIA#9261)

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

[https://nvbugs/5582091][test] increase warmup times in testing for multi-gpu cases (NVIDIA#9578)

Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>

[None][chore] Add failed cases into waives.txt (NVIDIA#9588)

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

[https://nvbugs/5702793][fix] Fix uncontiguous tensor view (NVIDIA#9576)

Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>

[None][infra] Waive failed cases for main branch (NVIDIA#9615)

Signed-off-by: qqiao <qqiao@nvidia.com>

[TRTLLM-9488][feat] use FlashInfer.sampling by default (NVIDIA#9545)

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

[None][infra] Update allowlist 2025/12/01 (NVIDIA#9616)

Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>

[None][infra] Remove an invalid test name in waives.txt (NVIDIA#9620)

Signed-off-by: qqiao <qqiao@nvidia.com>

Lock the gpu clocks in L0 perf tests (NVIDIA#9585)

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

[TRTLLM-9466][test] Evaluate helix parallelism with DSV3 Lite (NVIDIA#9597)

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

[None][fix] Extract GPU count from single-node stage names (NVIDIA#9599)

Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>

[https://nvbugs/5667774][fix] Refine Piecewise Cuda Graph Condition for DP (NVIDIA#9393)

Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>

[TRTLLM-9144][fix] enhance RPC robustness (NVIDIA#8711)

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>

[https://nvbugs/5627710][fix] Fix synchronization bugs in KvCacheTransferManager that can cause corrupted blocks (NVIDIA#9056)

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

[TRTLLM-8980][test] Clean up spec dec tests in test_llm_api_pytorch (NVIDIA#8889)

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>

[NVIDIA#9150][feat] Add code for nano v3 to custom implementation in AD (NVIDIA#9465)

* Why?

We would like to show an alternative to monkey-patching in AutoDeploy.

* What?

This commit builds on the existing custom model implementation for
NemotronH and adds the bits relevant for MoE layers.

Part of NVIDIA#9150.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

[NVIDIA#9150][feat] AutoDeploy: reviewer comments for NVIDIA#9150 (NVIDIA#9527)

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

[https://nvbugs/5651854][fix] Fix dist-serving perf by clearing CPU affinity (NVIDIA#9549)

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

[NVIDIA#9550][feat] AutoDeploy: Add NVFP4 Cutlass MoE kernels  (NVIDIA#9551)

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>

[https://nvbugs/5688388][fix] fix: Reducing num request in disagg test to speed up (NVIDIA#9598)

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

[TRTLLM-8946][feat] Improved heuristics to detect shardable regions (NVIDIA#9200)

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

[NVIDIA#9632][feat] Support EXTRA_WHEEL_BUILD_ARGS during wheel build (NVIDIA#9633)

Signed-off-by: Yu Chi Li <yuchil@nvidia.com>

[None][chore] Waive test failing on pre-merge (NVIDIA#9638)

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

[None][chore] Remove traceback dump for multimodal input processor (NVIDIA#9634)

Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>

[None][chore] Fix trtllm-eval and move GroupedGemmInputsHelper (NVIDIA#9612)

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

[https://nvbugs/5698434][fix] Use separate weight mapper for draft (NVIDIA#9607)

Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>

[TRTLLM-7101][infra] Reuse passed tests (NVIDIA#6894)

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>

[None][test] Remove duplicate test cases (NVIDIA#9623)

Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>

[None][infra] Check in most recent lock file from nightly pipeline

Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>

[None][feat] Add RocketKV usage doc and e2e accuracy test on LongBenchV2 (NVIDIA#9572)

Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>

[TRTLLM-9242][doc] Add examples showcasing openai compatible APIs (NVIDIA#9520)

Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>

[None][chore] AutoDeploy update cuda stream manager for multi-device (NVIDIA#9575)

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

[TRTLLM-9391][chore] Automatically estimate required workspace. (NVIDIA#9535)

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

[https://nvbugs/5708475][fix] Fix e2e eval accuracy for helix parallelism (NVIDIA#9647)

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

[https://nvbugs/5561153][test] Fix log error for perf test (NVIDIA#9622)

Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>

[TRTLLM-8241][feat] Aliasing to comply to LlmArgs (NVIDIA#9586)

Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>

[None][chore] Add failed cases into waives.txt (NVIDIA#9593)

Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>

[TRTLLM-6842][feat] Support Response API for general purpose (NVIDIA#9392)

Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>

[None][test] Update Qwen3-next accuracy testing by setting the cuda … (NVIDIA#9613)

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

[None][feat] update trtllm-gen nvfp4 kernels with better performance (NVIDIA#9510)

Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>

[None][doc] Replace the tensorrt icon with torch icon on overview.md (NVIDIA#9644)

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

[https://nvbugs/5705197][chore] Unwaive timeout disagg tests (NVIDIA#9637)

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

[https://nvbugs/5552132][fix] Enable LoRa for GPT OSS Torch (NVIDIA#8253)

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

[None][fix] Fix wide ep MoE error (NVIDIA#9642)

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

[https://nvbugs/5702795][fix] Remove the warning message for aten.log. (NVIDIA#9665)

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

[https://nvbugs/5693853][fix] Fix error handling when querying machin… (NVIDIA#9483)

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

[OMNIML-2932] [feat] nvfp4 awq support (NVIDIA#8698)

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

[NVIDIA#9643][fix] AutoDeploy: fix nano sharding config (NVIDIA#9668)

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

[NVIDIA#9147][feat] AutoDeploy: Draft Target Speculative Decoding (NVIDIA#9275)

Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>

[None][feat] Update Qwen3CodeToolParser to align tool-calling parameters (NVIDIA#9540)

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

[TRTLLM-7181][infra] Generate test results when pytest timeout happens (NVIDIA#9396)

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

[None][infra] Check in most recent lock file from nightly pipeline

Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>

[TRTLLM-9522][fix] restore `trtllm-serve mm_embedding_serve` (NVIDIA#9669)

[TRTLLM-5093][infra] Write env variables to a file in the interactive debug session (NVIDIA#6792)

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

[None][fix] fix error when processing batches containing both text and mm data (NVIDIA#8381)

Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>

[TRTLLM-7073][feat] Support torch compile for PP for Llama and DeepSeekV3 (NVIDIA#7838)

Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>

[None][feat] Add weights initialization and context phase parser to layer-wise benchmarks (NVIDIA#9667)

Signed-off-by: Tailing Yuan <yuantailing@gmail.com>

[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint (NVIDIA#9057)

Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>

[NVIDIA#8733][feat] Add Llama4 MoE handling to AutoDeploy (NVIDIA#9556)

Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>

[None][ci] unwaive tests (NVIDIA#9651)

Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>

[None][feat] Add NIXL-LIBFABRIC support (NVIDIA#9225)

Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
Signed-off-by: zackyoray <yorayz@nvidia.com>

[None][test] rename wide ep and disagg metric name in perf test (NVIDIA#9704)

Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>

[https://nvbugs/5467531][fix] Unwaive fused_moe all to all test with … (NVIDIA#9617)

Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>

[None][fix] Recover TRTLLM MoE Perf for DEP (NVIDIA#9562)

Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>

[None][chore] Add failed cases into waives.txt (NVIDIA#9662)

Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>

[None][fix] Fix TLLM_SPEC_DECODE_FORCE_NUM_ACCEPTED_TOKENS for MTP/EAGLE (NVIDIA#9608)

Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>

[None][infra] Add container notices and documentation (NVIDIA#9185)

Signed-off-by: Parker Drake <pdrake@nvidia.com>

[TRTLLM-5312][infra] Add triton trigger rules (NVIDIA#6440)

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

[None][doc] Add feature docs for helix parallelism (NVIDIA#9684)

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>

[TRTLLM-9579][infra] Set mergeWaiveList stage UNSTABLE when there is any issue (NVIDIA#9692)

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

[None][doc] Added line about partial reuse (NVIDIA#7846)

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

[TRTLLM-8920][feat] decouple disagg service from fastapi (NVIDIA#8714)

Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>

[https://nvbugs/5633340][fix] start disagg workers and servers on free ports (NVIDIA#9694)

Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>

[TRTLLM-9562] [doc] Add Deployment Guide for Kimi K2 Thinking on TensorRT LLM - Blackwell (NVIDIA#9711)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

[NVIDIA#9602][feat] AutoDeploy: Support TRTLLM Sampler (NVIDIA#9641)

Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>

[None][infra] Check in most recent lock file from nightly pipeline

Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>

[None] [tests] Unwaive EPLB tests (NVIDIA#9625)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

[https://nvbugs/5518713][test] Refactor core test lists by merging with llm_perf_cluster.yml (NVIDIA#9714)

Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>

[TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders (NVIDIA#9583)

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

[None][refactor] Improve request processing function in sampler (NVIDIA#9671)

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

[https://nvbugs/5670672][fix] Fix flaky KV connector tests (NVIDIA#9676)

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

[None][infra] Update allowed list 20251204 (NVIDIA#9718)

Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>

[None][feat] AutoDeploy: Perf optimization for Attention and rmsnorm (NVIDIA#9719)

Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>

[None][chore] Waive flakey disagg tests (NVIDIA#9749)

Signed-off-by: Mike Iovine <miovine@nvidia.com>

[https://nvbugs/5601682][fix] Fix cacheTransceiver hang (NVIDIA#9311)

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>

[TRTLLM-9199][docs] KV Connector Docs (NVIDIA#9325)

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>

[TRTLLM-9160][doc] add doc to llm_runtime.py (NVIDIA#9482)

Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>

[None][doc] VDR 1.0 trtllm-serve doc enhancement (NVIDIA#9443)

Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>

[TRTLLM-9086][doc] Clean up TODOs in documentation (NVIDIA#9292)

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>

[TRTLLM-9157][doc] Guided decoding doc improvement (NVIDIA#9359)

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>

[None][infra] Updated Linux installation guide (NVIDIA#9485)

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>

[TRTLLM-9075][doc] refine the slurm examples (NVIDIA#9548)

Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>

[TRTLLM-9093][doc] update hyper links in overview (NVIDIA#9568)

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>

[TRTLLM-9092][doc] link to modelopt checkpoints in quick start guide (NVIDIA#9571)

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>

[None][infra] Check in most recent lock file from nightly pipeline

Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>

[None][fix] Fix triton moe load_weight (NVIDIA#9649)

Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>

[None][fix] fix a bug: deepseek_fp8_block_scales in TRTLLMGEN-MoE use 2D x_sf instead of 1D (NVIDIA#9658)

Signed-off-by: xxi <xxi@nvidia.com>

[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP (NVIDIA#9592)

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

[TRTLLM-9522][chore] implement default `attach_multimodal_embeddings` (NVIDIA#9664)

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

[TRTLLM-9660][feat] Convert cuteDSL GEMM to opt-in feature (NVIDIA#9682)

Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

[None][fix] enable hmac in RPC (NVIDIA#9745)

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

[None][infra] Check in most recent lock file from nightly pipeline

Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>

[https://nvbugs/5703953][fix] Preserving ip:port for trtllm-serve before initializing llm (NVIDIA#9646)

Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>

[None][infra] Waive failed cases for main branch on 12/07 (NVIDIA#9769)

Signed-off-by: qqiao <qqiao@nvidia.com>

[None][fix] Several minor fixes to CI setting (NVIDIA#9765)

Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>

[OMNIML-3036][doc] Re-branding TensorRT-Model-Optimizer as Nvidia Model-Optimizer (NVIDIA#9679)

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

[None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce (NVIDIA#9314)

Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>

[TRTLLM-9000][feat] Add multi-node Perf Tests into CI (NVIDIA#8800)

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

[None][test] add ntp tolerance in time metrics verification (NVIDIA#9741)

Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>

[TRTLLM-9603][feat] Enable ConfigurableMoE test in the CI (NVIDIA#9645)

[https://nvbugs/5422621][test] Add GB 200 WIDEEP test case for RCCA 5422621 (NVIDIA#9506)

Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>

[None][fix] Fix two tuning cache miss issues. (NVIDIA#9743)

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

[None][infra] Check in most recent lock file from nightly pipeline

Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>

[TRTLLM-9706] [doc] Update wide EP documents (NVIDIA#9724)

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

[https://nvbugs/5666804][test] only adding sampler config for limited models (NVIDIA#9512)

Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>

[None][infra] Waive failed cases for main on 12/08 (NVIDIA#9773)

Signed-off-by: qqiao <qqiao@nvidia.com>

[None][chore] Move the rocketkv e2e test to post-merge (NVIDIA#9768)

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

[None][chore] Enable tvm_ffi for cute dsl nvfp4_gemm to reduce host overhead. (NVIDIA#9690)

Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>

[TRTLLM-9431][perf] Enable multistream for Linear Attention in Qwen3-… (NVIDIA#9696)

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

[None][chore] Remove closed bugs (NVIDIA#9770)

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

[None][infra] update mooncake in docker images (NVIDIA#9584)

Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>

[None][test] Add Kimi k2 WIDEEP perf and accuracy cases (NVIDIA#9686)

Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

[https://nvbugs/5527655][test] Add test case for RCCA 5527655 (NVIDIA#9511)

Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>

[http://nvbugs/5649010][fix] fix test_auto_scaling.py::test_worker_restart timeout (NVIDIA#9775)

Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>

[None][fix] Switch AutoDeploy's default allreduce strategy to NCCL (NVIDIA#9666)

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

[TRTLLM-9506][fix] Fix AR for DeepSeek-R1 2 model path (NVIDIA#9661)

Signed-off-by: qgai <qgai@nvidia.com>

ray + updatew works

trtllm works in async env

trtllm works in sync and async env

ray + updatew works

rebase to the updated verl

server mode

still cherry pick

still cherry pick

still cherry pick

integrated http interface

hang at RyExecutor create workers ray.remote

clean code

use tensorrt_llm.rlhf_utils

Signed-off-by: Liwei Ma <liweim@nvidia.com>

placement, asyncllm, and basic tests
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

connect sleep and wakeup; Add support to pass None to update_weights
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

Batching ctx for IFB scheduler

Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>

accuracy WAR for TP>1: always use AllReduceStrategy.NCCL, refactored
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

fix e2e integration

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

update asyncllm, other nits
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

fix init setup

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

Fix TRTLLMSampler logprobs perf

Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>

fix and cleanup
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

fix server

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

Revert "Batching ctx for IFB scheduler"

This reverts commit b51aac0

Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>

update & address comments

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
codego7250 pushed a commit to codego7250/TensorRT-LLM that referenced this pull request Dec 11, 2025
…ases (NVIDIA#9356)

Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants