Skip to content

Conversation

@Shixiaowei02
Copy link
Collaborator

@Shixiaowei02 Shixiaowei02 commented Jul 24, 2025

Summary by CodeRabbit

  • Refactor

    • Standardized cache transceiver backend identifiers to uppercase: DEFAULT, UCX, NIXL, MPI. Update configurations and inputs accordingly.
  • Tests

    • Updated unit and integration tests to use uppercase backend values across disaggregated configs and accuracy suites.
  • Chores

    • Aligned example configs and YAML generation to the new uppercase backend identifiers for consistency.

@Shixiaowei02 Shixiaowei02 requested a review from a team as a code owner July 24, 2025 05:59
@Shixiaowei02 Shixiaowei02 requested a review from nv-guomingz July 24, 2025 05:59
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 24, 2025

📝 Walkthrough

Walkthrough

Updated CacheTransceiverConfig.backend annotation to use uppercase Literal values ("DEFAULT","UCX","NIXL","MPI") and normalized all tests, example configs, and YAML fixtures to use the matching uppercase backend strings. No runtime logic, validation, or control-flow changes.

Changes

Cohort / File(s) Summary
Core API
tensorrt_llm/llmapi/llm_args.py
Change type hint for CacheTransceiverConfig.backend from Optional[Literal["default","ucx","nixl","mpi"]] to Optional[Literal["DEFAULT","UCX","NIXL","MPI"]].
Disaggregated test YAML configs
tests/integration/defs/disaggregated/test_configs/*
Replace backend literals to uppercase across many YAMLs (e.g., "default"→"DEFAULT", "ucx"→"UCX", "nixl"→"NIXL", "mpi"→"MPI").
Accuracy tests (inline configs)
tests/integration/defs/accuracy/test_disaggregated_serving.py
Update inline cache_transceiver_config.backend string constants to uppercase (multiple test cases).
Disaggregated tests (Python fixtures)
tests/integration/defs/disaggregated/test_disaggregated_etcd.py, tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
Update hard-coded YAML content and test constructor arguments to use uppercase backend strings.
Unit tests - LLM args
tests/unittest/llmapi/test_llm_args.py
Adjust test inputs and assertions to use uppercase backend values (e.g., "ucx"→"UCX").
Examples - configs
examples/disaggregated/disagg_config.yaml
Change cache_transceiver_config.backend occurrences to uppercase.
Examples - Slurm generator
examples/disaggregated/slurm/benchmark/gen_yaml.py
Update generated backend strings from lowercase to uppercase.

Sequence Diagram(s)

(omitted — changes are declarative value/casing updates only, no control-flow or feature behavior to diagram)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Possibly related PRs

Suggested reviewers

  • nv-guomingz
  • qiaoxj07
  • Tabrizian
  • pcastonguay
  • chzblych
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai bot requested review from kaiyux and pcastonguay July 24, 2025 05:59
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tensorrt_llm/llmapi/llm_args.py (2)

863-866: Consider the trade-off between compile-time safety and runtime flexibility.

Changing from Optional[Literal["default", "ucx", "nixl", "mpi"]] to Optional[str] removes compile-time type checking. While the runtime validator compensates for this, invalid values will only be caught at runtime rather than during development or static analysis.

Consider if the flexibility gained (case-insensitive validation) is worth losing compile-time safety. If case-insensitive validation is the primary goal, you could alternatively keep the Literal type and add a field validator that converts input to lowercase before validation.


872-881: Good validator implementation with minor considerations.

The validator correctly implements case-insensitive validation with clear error messaging. A few observations:

  1. The order of valid backends differs from the original Literal order (["default", "ucx", "nixl", "mpi"] vs ["default", "ucx", "mpi", "nixl"]) - consider maintaining consistency.

  2. The truthy check if self.backend: will skip validation for empty strings, which may be intended but differs from None handling.

Consider sorting the valid_backends list alphabetically for consistency:

-        valid_backends = ["default", "ucx", "mpi", "nixl"]
+        valid_backends = ["default", "mpi", "nixl", "ucx"]
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 428e340 and 7ce3cce.

📒 Files selected for processing (2)
  • examples/disaggregated/README.md (2 hunks)
  • tensorrt_llm/llmapi/llm_args.py (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
tensorrt_llm/llmapi/llm_args.py (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

examples/disaggregated/README.md (1)

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

🔇 Additional comments (3)
examples/disaggregated/README.md (3)

19-41: Excellent documentation improvement!

The restructuring significantly enhances clarity by:

  • Presenting configuration file contents in proper YAML format instead of inline shell commands
  • Using shorter, more intuitive file names (ctx_extra-llm-api-config.yml vs context_extra-llm-api-config.yml)
  • Moving the overlap scheduler explanation into the YAML as a contextual comment
  • Clearly separating configuration file creation from server startup commands

This makes the setup process much more straightforward for users.


44-51: Commands updated consistently with new config file names.

The server startup commands correctly reference the new config file names while preserving all functional parameters. The clear section headers improve readability.


113-114: Dynamic scaling commands updated consistently.

The commands correctly use the new config file names while maintaining all functional parameters for the dynamic scaling feature.

@Shixiaowei02 Shixiaowei02 force-pushed the user/xiaoweis/doc branch 2 times, most recently from fd64e14 to 836311d Compare July 24, 2025 06:05
@Shixiaowei02
Copy link
Collaborator Author

/bot --help

@github-actions
Copy link

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@Shixiaowei02
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12812 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12812 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9546 completed with status: 'FAILURE'

@Shixiaowei02
Copy link
Collaborator Author

/bot run

@Shixiaowei02 Shixiaowei02 changed the title fix: mismatch between docs and actual commands fix: [BREAKING CHANGE] Mismatch between docs and actual commands Jul 28, 2025
@coderabbitai coderabbitai bot added Community want to contribute PRs initiated from Community Doc <NV>TRTLLM's textual/illustrative materials: API refs, guides, tutorials. Improvement & clarity. labels Jul 28, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
examples/disaggregated/README.md (1)

126-131: Fix typo refersh_intervalrefresh_interval

The key name in the metadata-server config is misspelled, which will break parsing at runtime.

-refersh_interval: 10.0
+refresh_interval: 10.0
🧹 Nitpick comments (2)
benchmarks/cpp/README.md (1)

338-342: Add language identifier to fenced block

The fenced block that contains the export and mpirun commands is missing a language hint, triggering MD040 warnings and losing syntax highlighting.

-```
+```bash
 export TRTLLM_USE_UCX_KVCACHE=1
 mpirun -n ${proc} benchmarks/disaggServerBenchmark --context_engine_dirs ${context_engine_0},${context_engine_1}...,${context_engine_{m-1}} \
 --generation_engine_dirs ${generation_engine_0},${generation_engine_1}...,${generation_engine_{n-1}} --dataset ${dataset_path}

</blockquote></details>
<details>
<summary>examples/disaggregated/README.md (1)</summary><blockquote>

`96-100`: **Specify language for the client invocation block**

This fenced block is missing a language specifier, again tripping MD040. Add `bash` (or `shell`) so linters are quiet and readers get highlighting.

```diff
-```
+```bash
 python3 ./clients/disagg_client.py -c disagg_config.yaml -p ./clients/prompts.json -e chat

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: .coderabbit.yaml**
**Review profile: CHILL**
**Plan: Pro**


<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 836311d189385e626704ffeda5a906180f45fdbe and 8a5d19719e65ed19d683926c26c8f54dde31d019.

</details>

<details>
<summary>📒 Files selected for processing (31)</summary>

* `benchmarks/cpp/README.md` (1 hunks)
* `docs/source/advanced/disaggregated-service.md` (0 hunks)
* `examples/cpp/executor/README.md` (1 hunks)
* `examples/disaggregated/README.md` (3 hunks)
* `examples/disaggregated/slurm/gen_yaml.py` (2 hunks)
* `tensorrt_llm/llmapi/llm_args.py` (1 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml` (1 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml` (1 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml` (1 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml` (1 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml` (1 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml` (1 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml` (1 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml` (1 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml` (2 hunks)
* `tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml` (1 hunks)

</details>

<details>
<summary>💤 Files with no reviewable changes (1)</summary>

* docs/source/advanced/disaggregated-service.md

</details>

<details>
<summary>✅ Files skipped from review due to trivial changes (28)</summary>

* tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml
* examples/disaggregated/slurm/gen_yaml.py
* tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml
* tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml
* examples/cpp/executor/README.md
* tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml
* tensorrt_llm/llmapi/llm_args.py

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (2)</summary>

<details>
<summary>benchmarks/cpp/README.md (1)</summary>

Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

</details>
<details>
<summary>examples/disaggregated/README.md (2)</summary>

Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

</details>

</details><details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

<details>
<summary>benchmarks/cpp/README.md</summary>

346-346: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>
<details>
<summary>examples/disaggregated/README.md</summary>

28-28: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

45-45: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

113-113: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13175 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13175 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9867 completed with status: 'FAILURE'

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14990 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11319 completed with status: 'FAILURE'

@Shixiaowei02
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15083 [ run ] triggered by Bot

@Shixiaowei02 Shixiaowei02 changed the title [None][fix] BREAKING CHANGE: Mismatch between docs and actual commands [TRTLLM-7030][fix] BREAKING CHANGE: Mismatch between docs and actual commands Aug 13, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #15083 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11387 completed with status: 'FAILURE'

@Shixiaowei02
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15113 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (2)
examples/disaggregated/slurm/benchmark/gen_yaml.py (2)

9-10: Fix Python 3.8-incompatible generic annotations (use typing.Tuple instead of built-in tuple[]).

Our guidelines require Python 3.8+, but PEP 585 built-in generics (tuple[int, ...]) are only valid in 3.9+. This will raise at import time on 3.8 unless future annotations are enabled. Replace with typing.Tuple.

Apply this diff within the selected lines:

-def process_node_and_task() -> tuple[int, List[str], List[str]]:
+def process_node_and_task() -> Tuple[int, List[str], List[str]]:

Also add Tuple to the typing imports near Line 4 (outside the selected range):

from typing import Dict, List, Tuple

85-86: Fix Python 3.8-incompatible return annotation in generate_urls.

Same issue as above; change tuple[List[str], int] to Tuple[List[str], int].

Apply this diff within the selected lines:

-                  task_nodes_offset: int = 0) -> tuple[List[str], int]:
+                  task_nodes_offset: int = 0) -> Tuple[List[str], int]:

Ensure you’ve added:
from typing import Tuple
to the existing typing imports, as noted earlier.

♻️ Duplicate comments (2)
tests/unittest/llmapi/test_llm_args.py (2)

669-672: Fix Ruff F405 (star-import) and follow namespace import guideline for CacheTransceiverConfig.

Use the module namespace to avoid F405 and adhere to the coding guideline.

Apply this diff within the selected lines:

-        config = CacheTransceiverConfig(backend="UCX",
+        config = llm_args.CacheTransceiverConfig(backend="UCX",
             max_tokens_in_buffer=1024)
         assert config.backend == "UCX"

Add this import near the other imports at the top of the file (outside selected range):

import tensorrt_llm.llmapi.llm_args as llm_args

677-677: Fix Ruff F405 here as well (use namespaced class).

Mirror the change above for the invalid-argument case.

Apply this diff within the selected lines:

-            CacheTransceiverConfig(backend="UCX", invalid_config="should_fail")
+            llm_args.CacheTransceiverConfig(backend="UCX", invalid_config="should_fail")
🧹 Nitpick comments (3)
examples/disaggregated/slurm/benchmark/gen_yaml.py (1)

1-1: Add NVIDIA copyright header to comply with repository standards.

Per coding guidelines, all Python sources should include the NVIDIA header.

Add this at the top of the file:

# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (2)

13-13: Nit: quote DEFAULT to keep YAML formatting consistent with other configs.

Other files quote the backend string; consistency reduces churn and ambiguity.

Apply this diff:

-    backend: DEFAULT
+    backend: "DEFAULT"

21-21: Nit: quote DEFAULT to keep YAML formatting consistent with other configs (generation_servers).

Same as the previous suggestion.

Apply this diff:

-    backend: DEFAULT
+    backend: "DEFAULT"
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9dde14 and 2ccc7e4.

📒 Files selected for processing (44)
  • examples/disaggregated/disagg_config.yaml (1 hunks)
  • examples/disaggregated/slurm/benchmark/gen_yaml.py (2 hunks)
  • tensorrt_llm/llmapi/llm_args.py (1 hunks)
  • tests/integration/defs/accuracy/test_disaggregated_serving.py (13 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_mpi.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_ucx.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_disaggregated_etcd.py (1 hunks)
  • tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (3 hunks)
  • tests/unittest/llmapi/test_llm_args.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (39)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml
  • tests/integration/defs/disaggregated/test_disaggregated_etcd.py
  • examples/disaggregated/disagg_config.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml
  • tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_ucx.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_mpi.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
  • tests/integration/defs/accuracy/test_disaggregated_serving.py
  • tensorrt_llm/llmapi/llm_args.py
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • examples/disaggregated/slurm/benchmark/gen_yaml.py
  • tests/unittest/llmapi/test_llm_args.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • examples/disaggregated/slurm/benchmark/gen_yaml.py
  • tests/unittest/llmapi/test_llm_args.py
🪛 Ruff (0.12.2)
tests/unittest/llmapi/test_llm_args.py

669-669: CacheTransceiverConfig may be undefined, or defined from star imports

(F405)


677-677: CacheTransceiverConfig may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (7)
examples/disaggregated/slurm/benchmark/gen_yaml.py (2)

198-201: LGTM: backend normalized to uppercase "DEFAULT" (context_servers).

Matches the updated Literal type and keeps docs/code aligned.


226-229: LGTM: backend normalized to uppercase "DEFAULT" (generation_servers).

Consistent with the rest of the PR’s uppercase normalization.

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (3)

20-21: LGTM: backend set to "DEFAULT" (context_servers).

Conforms to the updated uppercase Literal contract.


36-37: LGTM: backend set to "DEFAULT" (generation_servers).

Consistent with the new typing and other configs.


1-1: No lingering lowercase backend values detected
A repository-wide search confirms that all backend entries use uppercase (DEFAULT, UCX, NIXL, MPI). There are no remaining lowercase occurrences of default, ucx, nixl, or mpi.

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml (2)

13-13: LGTM: backend normalized to DEFAULT (context_servers).

Matches the Literal["DEFAULT","UCX","NIXL","MPI"] requirement.


25-25: LGTM: backend normalized to DEFAULT (generation_servers).

Aligned with the rest of the PR.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15113 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11410 completed with status: 'FAILURE'

Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
@Shixiaowei02
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15142 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/unittest/llmapi/test_llm_args.py (1)

669-672: Fix Ruff F405 and adhere to namespace-import guideline without wide refactor

Use a namespaced import for llm_args and reference CacheTransceiverConfig via that namespace to satisfy the “maintain module namespace” rule and remove F405 at these sites. Avoid alias “llm_args” to prevent shadowing local variables named llm_args in other tests.

Apply within the selected lines:

-        config = CacheTransceiverConfig(backend="UCX",
+        config = llm_args_mod.CacheTransceiverConfig(backend="UCX",
             max_tokens_in_buffer=1024)
         assert config.backend == "UCX"
...
-            CacheTransceiverConfig(backend="UCX", invalid_config="should_fail")
+            llm_args_mod.CacheTransceiverConfig(backend="UCX", invalid_config="should_fail")

Add this import near the other imports at the top of the file (outside the selected range):

import tensorrt_llm.llmapi.llm_args as llm_args_mod

Optionally, in a follow-up, consider replacing the star import:

  • from tensorrt_llm.llmapi.llm_args import *
    with explicit names or with the llm_args_mod namespace throughout the file to fully eliminate F405 risks elsewhere.

Also applies to: 677-677

🧹 Nitpick comments (2)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml (1)

12-12: Optional: unify YAML quoting style

Other configs use unquoted enum values; consider dropping quotes here for consistency. Both are valid YAML; this is purely stylistic.

Apply this minimal diff:

-    backend: "NIXL"
+    backend: NIXL

Also applies to: 20-20

tests/unittest/llmapi/test_llm_args.py (1)

669-669: Add NVIDIA copyright header (current year)

Per coding guidelines, prepend the standard NVIDIA header to Python sources.

Add at the very top of the file (outside the selected range):

# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ccc7e4 and e4a6504.

📒 Files selected for processing (44)
  • examples/disaggregated/disagg_config.yaml (1 hunks)
  • examples/disaggregated/slurm/benchmark/gen_yaml.py (2 hunks)
  • tensorrt_llm/llmapi/llm_args.py (1 hunks)
  • tests/integration/defs/accuracy/test_disaggregated_serving.py (13 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_mpi.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_ucx.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_disaggregated_etcd.py (1 hunks)
  • tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (3 hunks)
  • tests/unittest/llmapi/test_llm_args.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (39)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_mpi.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_ucx.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml
  • examples/disaggregated/slurm/benchmark/gen_yaml.py
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml
  • tensorrt_llm/llmapi/llm_args.py
  • tests/integration/defs/disaggregated/test_disaggregated_etcd.py
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml
  • tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml
  • tests/integration/defs/accuracy/test_disaggregated_serving.py
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml
  • examples/disaggregated/disagg_config.yaml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tests/unittest/llmapi/test_llm_args.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tests/unittest/llmapi/test_llm_args.py
🪛 Ruff (0.12.2)
tests/unittest/llmapi/test_llm_args.py

669-669: CacheTransceiverConfig may be undefined, or defined from star imports

(F405)


677-677: CacheTransceiverConfig may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (6)
tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml (2)

21-21: Uppercased backend literal aligns with the new type annotation. LGTM.

Both cache_transceiver_config blocks now use DEFAULT, consistent with the updated Literal["DEFAULT","UCX","NIXL","MPI"].

Also applies to: 36-36


21-21: No leftover lowercase backend values found

The grep scan across YAML configs, Python code, and Markdown/docs returned only uppercase DEFAULT, UCX, MPI, and NIXL entries. There are no remaining lowercase default, ucx, nixl, or mpi occurrences.

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml (1)

14-14: Consistent enum casing change looks correct

Using DEFAULT for both context and generation servers matches the updated enum and keeps configs consistent.

Also applies to: 24-24

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (1)

18-18: LGTM: backend enum normalized to uppercase

DEFAULT here is consistent with the new Literal and with other updated fixtures.

Also applies to: 33-33

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml (1)

12-12: Correct NIXL uppercase enum

Switching from "nixl" to "NIXL" aligns with the new Literal. Functionally correct.

Also applies to: 20-20

tests/unittest/llmapi/test_llm_args.py (1)

669-672: LGTM on enum casing in tests

Updating "ucx" -> "UCX" aligns the test with the new Literal contract and expected values.

Also applies to: 677-677

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15142 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11435 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@Shixiaowei02 Shixiaowei02 merged commit 1095dfd into NVIDIA:main Aug 14, 2025
5 checks passed
@Shixiaowei02 Shixiaowei02 deleted the user/xiaoweis/doc branch August 14, 2025 09:50
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 17, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 17, 2025
NVIDIA#6323)

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 17, 2025
NVIDIA#6323)

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 17, 2025
NVIDIA#6323)

Signed-off-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 18, 2025
NVIDIA#6323)

Signed-off-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 18, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 18, 2025
NVIDIA#6323)

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community want to contribute PRs initiated from Community Doc <NV>TRTLLM's textual/illustrative materials: API refs, guides, tutorials. Improvement & clarity.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants