Skip to content

Conversation

@reasonsolo
Copy link
Collaborator

@reasonsolo reasonsolo commented Oct 9, 2025

Summary by CodeRabbit

  • New Features

    • Added disaggregated, cluster-aware serving with dynamic worker discovery and auto-scaling.
    • New CLI option: --disagg_cluster_uri for cluster configuration.
    • Readiness-gated requests and health checks; server refuses traffic until ready.
    • New /cluster_info endpoint exposing cluster status and readiness.
    • Router improvements to add/remove servers at runtime for seamless scaling.
    • Validation to ensure appropriate server role when using cluster or metadata config.
  • Tests

    • New end-to-end integration tests for service discovery, minimal instances, and worker/server restarts; added to H100 test list.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@reasonsolo reasonsolo force-pushed the tllm7843_sdimpl_pr branch 2 times, most recently from f59ef4c to 79b56a0 Compare October 9, 2025 07:20
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
@reasonsolo reasonsolo marked this pull request as ready for review October 9, 2025 07:31
@reasonsolo reasonsolo requested review from a team as code owners October 9, 2025 07:31
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 9, 2025

📝 Walkthrough

Walkthrough

Adds cluster-aware disaggregated serving: new cluster config plumbing from CLI and serve entrypoint into OpenAIServer and disaggregated server; dynamic router server management; cluster storage/manager lifecycle, readiness checks, and worker discovery. Introduces cluster configuration extraction utilities. Adds integration tests for auto-scaling, minimal instances, restarts, and test list entry.

Changes

Cohort / File(s) Summary
Serve entrypoint and wiring
tensorrt_llm/commands/serve.py
Adds disaggregated cluster support via new CLI option disagg_cluster_uri; derives cluster_config and passes it to server launch. Requires server_role when metadata or cluster config present. Updates launch_server signature to include cluster_config.
Disagg cluster config utilities
tensorrt_llm/llmapi/disagg_utils.py
Extends DisaggServerConfig with optional cluster_config. Updates extract_disagg_cfg to accept cluster and build cluster_config when provided. Adds extract_cluster_config(...) and helper update_dataclass(...).
OpenAI server (monolithic) cluster worker registration
tensorrt_llm/serve/openai_server.py
Extends constructor to accept optional cluster_config. On startup, creates cluster storage and registers a DisaggClusterWorker with bound host/port. Stores binding details on instance.
Disaggregated OpenAI server with cluster management
tensorrt_llm/serve/openai_disagg_server.py
Integrates cluster_config, cluster storage/manager, worker watch loop, dynamic router updates, readiness gating, cluster_info endpoint, and lifecycle cleanup. Enforces mutual exclusivity with metadata server.
Routers: dynamic server management
tensorrt_llm/serve/router.py
Adds public servers property and abstract add_server/remove_server. Implements dynamic add/remove with locking for RoundRobin, LoadBalancing, and KvCacheAware routers; updates selection and load tracking. Refines create_router defaulting.
Cluster storage logging
tensorrt_llm/serve/cluster_storage.py
Emits expired-keys debug log only when deletions occur; no functional change.
Integration tests: disagg auto-scaling
tests/integration/defs/disaggregated/test_auto_scaling.py
New tests covering service discovery, minimal instances, worker and server restarts using HTTP readiness and OpenAI-compatible requests; includes fixtures and helper processes.
Test list update
tests/integration/test_lists/test-db/l0_h100.yml
Adds the new disaggregated auto-scaling test to pre-merge H100 list.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Admin as Admin/CLI
  participant Serve as serve.py
  participant OAIS as OpenAIServer
  participant CS as ClusterStorage
  participant CW as ClusterWorker

  Admin->>Serve: serve --disagg_cluster_uri=...
  Serve->>Serve: derive cluster_config
  Serve->>OAIS: init(model, server_role, metadata_cfg, cluster_config)
  OAIS->>OAIS: store host/port on bind
  OAIS->>CS: create storage client
  OAIS->>CW: create and register worker (role from server_role)
  CW-->>CS: register(worker info)
  note over OAIS,CS: Worker registered for discovery
Loading
sequenceDiagram
  autonumber
  actor Admin as Admin/CLI
  participant Disagg as OpenAIDisaggServer
  participant CS as ClusterStorage
  participant CM as ClusterManager
  participant CtxR as ContextRouter
  participant GenR as GenerationRouter
  participant CtxW as Context Workers
  participant GenW as Generation Workers
  actor Client as Client

  Admin->>Disagg: start with cluster_config
  Disagg->>CS: create storage client
  Disagg->>CM: start manager, watch workers
  loop watch events
    CM-->>Disagg: worker up/down events
    Disagg->>CtxR: add/remove server (ctx role)
    Disagg->>GenR: add/remove server (gen role)
  end
  Client->>Disagg: completion/chat request
  Disagg->>Disagg: is_ready? (CM ready and routers non-empty)
  alt ready
    Disagg->>CtxR: select ctx server
    CtxR-->>CtxW: route context
    Disagg->>GenR: select gen server
    GenR-->>GenW: route generation
    GenW-->>Client: response
  else not ready
    Disagg-->>Client: 503/ready=false
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.94% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description Check ⚠️ Warning The pull request description is still the unmodified template and contains no actual title, no filled “Description” section, and no “Test Coverage” details. Required sections remain empty and placeholder guidance is present instead of concrete information. As a result, it does not meet the repository’s PR description requirements. Please replace the template text with a properly formatted PR title according to the repository guidelines, provide a concise description of the changes and rationale in the “Description” section, list the relevant tests in the “Test Coverage” section, and update or check off items in the PR checklist to reflect the completed work.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title follows the required template with a valid JIRA ticket, correct type tag, and clearly summarizes the main feature implementation of disaggregated cluster auto‐scaling, matching the changeset’s focus.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
tensorrt_llm/serve/openai_disagg_server.py (1)

146-155: Initialize _update_worker_task before use

self._update_worker_task is only assigned inside the cluster-manager branch, but the teardown path always evaluates if self._update_worker_task:. When cluster support is disabled (the default today), that line raises AttributeError during shutdown and the server exits uncleanly. Initialize the attribute up front (e.g. to None) and guard the cancel/await block.

         self.conditional_disagg_config = config.conditional_disagg_config
         self.cluster_config = config.cluster_config
         self.cluster_storage = None
         self.cluster_manager = None
+        self._update_worker_task: asyncio.Task | None = None
@@
-            if self._update_worker_task:
-                self._update_worker_task.cancel()
+            if self._update_worker_task:
+                self._update_worker_task.cancel()
+                with contextlib.suppress(asyncio.CancelledError):
+                    await self._update_worker_task
+                self._update_worker_task = None
tensorrt_llm/llmapi/disagg_utils.py (2)

11-16: all exports undefined names

'ServerConfig' and 'extract_server_configs' are not defined in this module. This can break users relying on star-imports/public API.

Apply:

 __all__ = [
-    'ServerConfig',
     'parse_disagg_config_file',
-    'extract_server_configs',
+    'extract_disagg_cfg',
+    'extract_cluster_config',
+    'get_ctx_gen_server_urls',
+    'extract_ctx_gen_cfgs',
+    'extract_router_config',
+    'get_server_configs_dict',
     'split_world_comm',
+    'parse_metadata_server_config_file',
+    'CtxGenServerConfig',
+    'RouterConfig',
+    'ConditionalDisaggConfig',
+    'MinimalInstances',
+    'DisaggClusterConfig',
+    'DisaggServerConfig',
+    'MetadataServerConfig',
 ]

83-86: PEP 585 generics break on Python 3.8

Target is Python 3.8+. Use typing.List/Tuple instead of list[...] and tuple[...].

Apply:

-def get_ctx_gen_server_urls(
-        server_configs: list[CtxGenServerConfig]
-) -> tuple[list[str], list[str]]:
+def get_ctx_gen_server_urls(
+        server_configs: List[CtxGenServerConfig]
+) -> Tuple[List[str], List[str]]:

As per coding guidelines

tensorrt_llm/serve/cluster_storage.py (2)

44-54: Fix Queue.task_done() accounting in drain()

You consume N events but call task_done() only once. This breaks queue accounting (join() can hang, internal counters skew).

Apply:

 async def drain(self):
     events = []
-    event = await self.events.get()
+    event = await self.events.get()
     logger.debug(f"Draining watch event: {self.events.qsize()}")
     events.append(event)
-    while not self.events.empty():
-        event = self.events.get_nowait()
-        events.append(event)
-    self.events.task_done()
+    self.events.task_done()
+    while not self.events.empty():
+        event = self.events.get_nowait()
+        events.append(event)
+        self.events.task_done()
     logger.debug(f"after draining watch event: {self.events.qsize()}")
     return events

1-1: Add NVIDIA Apache-2.0 header

Source files must include the NVIDIA Apache-2.0 header with current year. Add at top. As per coding guidelines.

+# Copyright 2025 NVIDIA CORPORATION & AFFILIATES
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
tensorrt_llm/serve/openai_server.py (1)

1-1: Add NVIDIA Apache-2.0 header

Required on all source files. As per coding guidelines.

+#!/usr/bin/env python
+# Copyright 2025 NVIDIA CORPORATION & AFFILIATES
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
-#!/usr/bin/env python
🧹 Nitpick comments (11)
tensorrt_llm/llmapi/disagg_utils.py (2)

35-39: server_role should be Optional

Annotated as ServerRole but default is None → tighten typing.

Apply:

 @dataclass
 class RouterConfig():
     type: str = "round_robin"
     args: dict = field(default_factory=dict)
-    server_role: ServerRole = None
+    server_role: Optional[ServerRole] = None

As per coding guidelines


243-263: extract_cluster_config helper looks good; minor polish optional

Key validation + nested dataclass handling is solid. If you want to appease TRY003, shorten the KeyError message or move long text into the exception class. Otherwise, LGTM.

tests/integration/defs/disaggregated/test_auto_scaling.py (6)

127-146: Add timeouts and remove redundant sleep; fix f-string without placeholders

Avoid infinite waits and redundant delays.

Apply:

 async def wait_for_disagg_server_ready(port):
     while True:
-        await asyncio.sleep(3)
-        logger.info(f"Waiting for disagg server to be ready")
+        await asyncio.sleep(3)
+        logger.info("Waiting for disagg server to be ready")
         try:
-            info_resp = requests.get(f"http://localhost:{port}/cluster_info")
+            info_resp = requests.get(f"http://localhost:{port}/cluster_info",
+                                     timeout=REQUEST_TIMEOUT)
             if info_resp.status_code == 200:
                 info = info_resp.json()
                 if info["is_ready"]:
                     break
                 logger.info(
                     f"Waiting for disagg server to be ready: {info_resp.json()}"
                 )
             else:
                 logger.info(
                     f"Failed to get cluster info: {info_resp.status_code}")
-            await asyncio.sleep(3)
         except requests.exceptions.RequestException as e:
             logger.error(f"Failed to get cluster info: {e}")

148-158: Add request timeout to worker health checks

Reliability improvement.

Apply:

 async def wait_for_worker_ready(port):
     while True:
         await asyncio.sleep(3)
         logger.info(f"Waiting for worker {port} to be ready")
         try:
-            info_resp = requests.get(f"http://localhost:{port}/health")
+            info_resp = requests.get(f"http://localhost:{port}/health",
+                                     timeout=REQUEST_TIMEOUT)
             if info_resp.status_code == 200:
                 break
         except requests.exceptions.RequestException as e:
             logger.info(f"Failed to get worker info: {e}")

160-174: Add request timeout to cluster_info verification

Apply:

 def verify_cluster_info(ready,
                         ctx_workers=-1,
                         gen_workers=-1,
                         port=TEST_PORT,
                         expected_code=200):
-    info_resp = requests.get(f"http://localhost:{port}/cluster_info")
+    info_resp = requests.get(f"http://localhost:{port}/cluster_info",
+                             timeout=REQUEST_TIMEOUT)
     assert info_resp.status_code == expected_code

116-124: Avoid predictable /tmp path for config file

Use NamedTemporaryFile to reduce collision/injection risk.

Apply:

 def run_disagg_server(disagg_cluster_config, port=TEST_PORT):
-    disagg_server_config_path = f"/tmp/disagg_server_{port}_config.yaml"
-    disagg_cluster_config["port"] = port
-    with open(disagg_server_config_path, "w+") as f:
-        yaml.dump(disagg_cluster_config, f)
-    cmds = ["trtllm-serve", "disaggregated", "-c", disagg_server_config_path]
+    disagg_cluster_config["port"] = port
+    with tempfile.NamedTemporaryFile(mode="w+", suffix=f"_{port}_config.yaml",
+                                     delete=False) as tf:
+        yaml.dump(disagg_cluster_config, tf)
+        tf.flush()
+        disagg_server_config_path = tf.name
+    cmds = ["trtllm-serve", "disaggregated", "-c", disagg_server_config_path]
     f = open("disagg_server.log", "w+")
     p = subprocess.Popen(cmds, stdout=f, stderr=f)
     return p

259-326: Initialize proc vars; use specific exception type; mark router as used

Improves stability and lint cleanliness.

Apply:

 @pytest.mark.parametrize("router", ROUTER_TYPES, indirect=True)
 @pytest.mark.asyncio(loop_scope="module")
 @pytest.mark.timeout(600)
 async def test_worker_restart(model_name, disagg_server_config, worker_config,
                               router):
-    try:
+    assert router in ROUTER_TYPES
+    ctx_worker1 = None
+    ctx_worker2 = None
+    gen_worker1 = None
+    gen_worker2 = None
+    disagg_server = None
+    try:
@@
-        with pytest.raises(Exception):
+        from openai import APIStatusError
+        with pytest.raises(APIStatusError):
             request_completion(model_name, "Hello, my name is", port=TEST_PORT)
@@
-        with pytest.raises(Exception):
+        with pytest.raises(APIStatusError):
             request_completion(model_name, test_prompt, port=TEST_PORT)

331-363: Initialize proc vars; assert router used; tighten expected failure exception

Use AssertionError for verify_cluster_info(expected_code=500).

Apply:

 @pytest.mark.parametrize("router", ["round_robin"], indirect=True)
 @pytest.mark.asyncio(loop_scope="module")
 @pytest.mark.timeout(300)
 async def test_disagg_server_restart(model_name, disagg_server_config,
                                      worker_config, router):
-    try:
+    assert router in ROUTER_TYPES
+    ctx_worker1 = None
+    gen_worker1 = None
+    disagg_server = None
+    try:
@@
-        with pytest.raises(Exception):
+        with pytest.raises(AssertionError):
             verify_cluster_info(False, 1, 1, expected_code=500)
tensorrt_llm/serve/cluster_storage.py (2)

219-226: Type hint mismatch: get_prefix returns Dict, not List

Return type is Dict[str, str]. Please fix the annotation for clarity and tooling.

-    async def get_prefix(self,
-                         key_prefix: str,
-                         keys_only: bool = False) -> List[str]:
+    async def get_prefix(self,
+                         key_prefix: str,
+                         keys_only: bool = False) -> Dict[str, str]:

251-260: Reduce log noise in _notify_watch_event

logger.info runs for every watch handle, even when the key doesn’t match. Log only on actual notify (or downgrade to debug).

         for watch_key, handle in self._watch_handles.items():
             if key.startswith(watch_key):
                 # update queue immediately and wake up the event loop
                 handle.events.put_nowait(
                     WatchEvent(storage_item, event_type))
-            logger.info(
-                f"Notifying watch event for watch key {watch_key} with type {event_type}"
-            )
+                logger.info(
+                    f"Notifying watch event for watch key {watch_key} with type {event_type}"
+                )
tensorrt_llm/serve/openai_server.py (1)

81-83: Type hint: make metadata_server_cfg Optional

It can be None (create_metadata_server handles that), so reflect it in the signature.

-                 metadata_server_cfg: MetadataServerConfig,
+                 metadata_server_cfg: Optional[MetadataServerConfig],
                  cluster_config: Optional[DisaggClusterConfig] = None):
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27677a3 and 459cfcf.

📒 Files selected for processing (8)
  • tensorrt_llm/commands/serve.py (6 hunks)
  • tensorrt_llm/llmapi/disagg_utils.py (5 hunks)
  • tensorrt_llm/serve/cluster_storage.py (1 hunks)
  • tensorrt_llm/serve/openai_disagg_server.py (10 hunks)
  • tensorrt_llm/serve/openai_server.py (5 hunks)
  • tensorrt_llm/serve/router.py (8 hunks)
  • tests/integration/defs/disaggregated/test_auto_scaling.py (1 hunks)
  • tests/integration/test_lists/test-db/l0_h100.yml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tensorrt_llm/serve/cluster_storage.py
  • tensorrt_llm/serve/openai_server.py
  • tensorrt_llm/llmapi/disagg_utils.py
  • tensorrt_llm/serve/openai_disagg_server.py
  • tests/integration/defs/disaggregated/test_auto_scaling.py
  • tensorrt_llm/commands/serve.py
  • tensorrt_llm/serve/router.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tensorrt_llm/serve/cluster_storage.py
  • tensorrt_llm/serve/openai_server.py
  • tensorrt_llm/llmapi/disagg_utils.py
  • tensorrt_llm/serve/openai_disagg_server.py
  • tests/integration/defs/disaggregated/test_auto_scaling.py
  • tensorrt_llm/commands/serve.py
  • tensorrt_llm/serve/router.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tensorrt_llm/serve/cluster_storage.py
  • tensorrt_llm/serve/openai_server.py
  • tensorrt_llm/llmapi/disagg_utils.py
  • tensorrt_llm/serve/openai_disagg_server.py
  • tests/integration/defs/disaggregated/test_auto_scaling.py
  • tensorrt_llm/commands/serve.py
  • tensorrt_llm/serve/router.py
🧬 Code graph analysis (7)
tensorrt_llm/serve/cluster_storage.py (1)
tensorrt_llm/logger.py (1)
  • debug (144-145)
tensorrt_llm/serve/openai_server.py (4)
tensorrt_llm/llmapi/disagg_utils.py (3)
  • DisaggClusterConfig (53-58)
  • MetadataServerConfig (75-80)
  • ServerRole (19-22)
tensorrt_llm/commands/serve.py (1)
  • serve (340-405)
tensorrt_llm/serve/cluster_storage.py (1)
  • create_cluster_storage_client (112-115)
tensorrt_llm/serve/disagg_auto_scaling.py (2)
  • DisaggClusterWorker (189-301)
  • register_worker (229-264)
tensorrt_llm/llmapi/disagg_utils.py (1)
tests/unittest/disaggregated/test_disagg_cluster_manager_worker.py (1)
  • config (34-41)
tensorrt_llm/serve/openai_disagg_server.py (5)
tensorrt_llm/llmapi/disagg_utils.py (2)
  • MetadataServerConfig (75-80)
  • ServerRole (19-22)
tensorrt_llm/commands/serve.py (1)
  • serve (340-405)
tensorrt_llm/serve/cluster_storage.py (4)
  • WatchEventType (26-28)
  • create_cluster_storage (106-109)
  • start (63-64)
  • start (158-161)
tensorrt_llm/serve/disagg_auto_scaling.py (5)
  • DisaggClusterManager (32-186)
  • watch_workers (90-104)
  • is_ready_with_router (184-186)
  • worker_info (218-222)
  • get_worker_events (110-125)
tensorrt_llm/serve/router.py (10)
  • KvCacheAwareRouter (580-707)
  • servers (168-169)
  • add_server (172-176)
  • add_server (444-454)
  • add_server (510-520)
  • add_server (628-638)
  • remove_server (179-183)
  • remove_server (456-467)
  • remove_server (522-533)
  • remove_server (642-650)
tests/integration/defs/disaggregated/test_auto_scaling.py (1)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (2)
  • env (54-61)
  • disagg_server (162-179)
tensorrt_llm/commands/serve.py (1)
tensorrt_llm/llmapi/disagg_utils.py (4)
  • DisaggClusterConfig (53-58)
  • MetadataServerConfig (75-80)
  • ServerRole (19-22)
  • extract_cluster_config (243-262)
tensorrt_llm/serve/router.py (2)
tests/unittest/disaggregated/test_router.py (1)
  • servers (50-51)
tensorrt_llm/logger.py (2)
  • warning (132-133)
  • debug (144-145)
🪛 Ruff (0.13.3)
tensorrt_llm/llmapi/disagg_utils.py

250-251: Avoid specifying long messages outside the exception class

(TRY003)

tensorrt_llm/serve/openai_disagg_server.py

95-95: Avoid specifying long messages outside the exception class

(TRY003)


98-98: Avoid specifying long messages outside the exception class

(TRY003)


102-102: Avoid specifying long messages outside the exception class

(TRY003)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


621-621: Do not assign a lambda expression, use a def

Rewrite worker_repr as a def

(E731)


636-636: Do not catch blind exception: Exception

(BLE001)

tests/integration/defs/disaggregated/test_auto_scaling.py

99-99: subprocess call: check for execution of untrusted input

(S603)


117-117: Probable insecure usage of temporary file or directory: "/tmp/disagg_server_"

(S108)


123-123: subprocess call: check for execution of untrusted input

(S603)


130-130: f-string without any placeholders

Remove extraneous f prefix

(F541)


132-132: Probable use of requests call without timeout

(S113)


153-153: Probable use of requests call without timeout

(S113)


165-165: Probable use of requests call without timeout

(S113)


181-182: try-except-pass detected, consider logging the exception

(S110)


181-181: Do not catch blind exception: Exception

(BLE001)


198-198: Unused function argument: router

(ARG001)


220-220: Unused function argument: router

(ARG001)


237-237: Do not assert blind exception: Exception

(B017)


260-260: Unused function argument: router

(ARG001)


276-276: Do not assert blind exception: Exception

(B017)


296-296: Do not assert blind exception: Exception

(B017)


332-332: Unused function argument: router

(ARG001)


349-349: Do not assert blind exception: Exception

(B017)

tensorrt_llm/commands/serve.py

402-403: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


402-403: Avoid specifying long messages outside the exception class

(TRY003)

tensorrt_llm/serve/router.py

450-450: Consider [*old_servers.copy(), server] instead of concatenation

Replace with [*old_servers.copy(), server]

(RUF005)


517-517: Consider [*old_servers.copy(), server] instead of concatenation

Replace with [*old_servers.copy(), server]

(RUF005)


740-741: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (7)
tests/integration/test_lists/test-db/l0_h100.yml (1)

40-40: Auto-scaling test inclusion looks good

Nice to see the new disaggregated auto-scaling coverage wired into the H100 pre-merge suite. This aligns with the surrounding disaggregated test set.

tensorrt_llm/serve/cluster_storage.py (1)

280-283: Good: reduce noisy logs in idle cycles

Guarding the debug log avoids spam when nothing expires. LGTM.

tensorrt_llm/serve/openai_server.py (5)

31-33: Imports look fine


38-40: Imports look fine


86-92: Store binding fields eagerly

Storing binding_addr/host/port for later registration is reasonable.


140-143: Init cluster fields

Good to initialize to None to avoid attribute errors.


966-968: Persist host/port before serving

Required for worker registration. Looks good.

@reasonsolo reasonsolo requested a review from hchings October 9, 2025 07:58
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 9, 2025

📝 Walkthrough

Walkthrough

Adds disaggregated cluster support across serve CLI, server initialization, and routers. Introduces cluster_config plumbing, cluster manager/worker/storage lifecycle, readiness gating, a /cluster_info endpoint, and dynamic router add/remove. Extends disaggregation utilities to construct cluster configs. Adjusts logging in cluster storage. Adds integration tests for auto-scaling and discovery.

Changes

Cohort / File(s) Summary
Serve CLI & Launch Plumbing
tensorrt_llm/commands/serve.py
Adds --disagg_cluster_uri option; updates serve(...) and launch_server(...) signatures to accept and pass cluster_config; resolves cluster config from CLI or config; enforces server_role when using metadata/cluster config; propagates to server startup.
Disaggregation Utilities
tensorrt_llm/llmapi/disagg_utils.py
Extends extract_disagg_cfg(...) with cluster; adds extract_cluster_config(...); augments DisaggServerConfig with cluster_config; initializes nested MinimalInstances; validates keys.
OpenAI Servers (Disagg + Base)
tensorrt_llm/serve/openai_disagg_server.py, tensorrt_llm/serve/openai_server.py
Wires cluster support: new cluster_config parameter, cluster storage/manager/worker lifecycle in lifespan, readiness checks gating requests, /cluster_info endpoint, dynamic router updates on watch events, host/port exposure, cleanup on shutdown.
Routers: Dynamic Server Management
tensorrt_llm/serve/router.py
Introduces servers property; adds async add_server/remove_server to abstract and concrete routers (RoundRobin, LoadBalancing, KvCacheAware); updates internal state with locking; create_router handles default type when config is None.
Cluster Storage Minor Adjustments
tensorrt_llm/serve/cluster_storage.py
Emits expired-keys debug log only when deletions exist; no functional change to deletion/notification.
Integration Tests: Disaggregated Auto-Scaling
tests/integration/defs/disaggregated/test_auto_scaling.py, tests/integration/test_lists/test-db/l0_h100.yml
Adds end-to-end tests covering discovery, minimal instances, worker/server restarts; helpers to spawn workers/server, readiness polling, cluster verification, OpenAI completion; registers test in H100 list.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as serve CLI
    participant Serve as launch_server
    participant OAI as OpenAIServer
    participant OAI-D as OpenAIDisaggServer
    participant CS as ClusterStorage
    participant CM as ClusterManager
    participant CW as ClusterWorker

    rect rgba(230,240,255,0.4)
    note over CLI: Startup with disaggregated cluster
    CLI->>Serve: parse args (disagg_cluster_uri/server_role)
    Serve->>Serve: extract_cluster_config(...)
    Serve->>OAI: init(..., cluster_config)
    Serve->>OAI-D: init(..., cluster_config)
    end

    rect rgba(240,255,230,0.4)
    note over OAI,OAI-D: Lifespan startup
    OAI->>CS: create client (if cluster_config)
    OAI->>CW: init worker(host, port)
    OAI-->>CS: register worker (deferred in lifespan)

    OAI-D->>CS: create client (if cluster_config)
    OAI-D->>CM: start manager (watch workers)
    CM-->>OAI-D: watch events (add/remove)
    OAI-D->>OAI-D: _update_router_by_watch_events
    end
Loading
sequenceDiagram
    autonumber
    participant Client as HTTP Client
    participant OAI-D as OpenAIDisaggServer
    participant RouterC as Context Router
    participant RouterG as Generation Router

    Client->>OAI-D: GET /cluster_info
    OAI-D->>OAI-D: is_ready()
    OAI-D-->>Client: status, counts

    Client->>OAI-D: POST /v1/completions
    OAI-D->>OAI-D: is_ready()? (gate)
    alt ready
        OAI-D->>RouterC: get_next_server()
        OAI-D->>RouterG: get_next_server()
        OAI-D-->>Client: completion
    else not ready
        OAI-D-->>Client: 503/health not ready
    end

    note over OAI-D,RouterC: On CM events
    OAI-D->>RouterC: add_server/remove_server
    OAI-D->>RouterG: add_server/remove_server
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested reviewers

  • schetlur-nv
  • nv-guomingz
  • qiaoxj07
  • Tabrizian

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description includes the template structure but lacks any substantive content in the Description and Test Coverage sections, leaving the intended change and associated tests undefined. Provide a meaningful summary in the Description section explaining the change and fill out the Test Coverage section with the relevant tests that validate the new functionality.
Docstring Coverage ⚠️ Warning Docstring coverage is 10.94% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title follows the required format including the ticket and type, concisely summarizes the main feature—implementing disaggregated cluster auto-scaling—and clearly conveys the primary change in a single sentence.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
tensorrt_llm/serve/openai_server.py (1)

77-83: Make metadata_server_cfg Optional to match call sites and factory

Type it as Optional and default to None to avoid surprising type mismatch and align with create_metadata_server usage.

-    def __init__(self,
-                 llm: Union[LLM, MultimodalEncoder],
-                 model: str,
-                 server_role: Optional[ServerRole],
-                 metadata_server_cfg: MetadataServerConfig,
-                 cluster_config: Optional[DisaggClusterConfig] = None):
+    def __init__(self,
+                 llm: Union[LLM, MultimodalEncoder],
+                 model: str,
+                 server_role: Optional[ServerRole],
+                 metadata_server_cfg: Optional[MetadataServerConfig] = None,
+                 cluster_config: Optional[DisaggClusterConfig] = None):
tensorrt_llm/serve/router.py (4)

248-275: Don’t assert in monitoring loop; handle empty/failed states and don’t re-raise

Asserting on runtime state and re-raising inside a long-lived task will kill monitoring and the server’s resilience. Log and continue instead.

-                assert final_servers, f"No {self._server_role} servers available"
+                if not final_servers:
+                    logger.warning(f"No {self._server_role} servers available")
+                    await asyncio.sleep(poll_interval)
+                    continue
@@
-            except Exception as e:
-                logger.error(f"Error in server monitoring: {e}")
-                raise
+            except Exception as e:
+                logger.error(f"Error in server monitoring: {e}")
+                await asyncio.sleep(poll_interval)
+                continue

381-399: Health-check should not assert on timeout; provide a safe default

When metadata_server_cfg is None or missing timeout, assert will raise. Use a default timeout.

-        assert self._health_check_timeout is not None, "health_check_timeout is not set"
+        timeout = self._health_check_timeout or 5.0
         try:
             async with self._session.get(
-                    f"{server_url}/health",
-                    timeout=self._health_check_timeout) as response:
+                    f"{server_url}/health",
+                    timeout=timeout) as response:

68-74: ServerState.is_healthy uses undefined self._session

This method will fail if used. Either pass a session or create one locally. Minimal fix below.

     async def is_healthy(self) -> bool:
-        try:
-            async with self._session.get(self._server + "/health") as response:
-                return response.status == 200
-        except Exception:
-            return False
+        try:
+            async with aiohttp.ClientSession() as session:
+                async with session.get(self._server + "/health") as response:
+                    return response.status == 200
+        except Exception:
+            return False

1-1: Missing NVIDIA Apache-2.0 header

Add the NVIDIA Apache-2.0 copyright header with current year at the top.

tensorrt_llm/commands/serve.py (1)

1-1: Missing NVIDIA Apache-2.0 header

Add the NVIDIA Apache-2.0 copyright header with current year at the top.

🧹 Nitpick comments (6)
tensorrt_llm/serve/openai_disagg_server.py (1)

620-638: Avoid assigning a lambda; define a small helper instead

Define worker_repr with def for clarity and to satisfy linters.

-        worker_repr = lambda worker_info: f"http://{worker_info.host}:{worker_info.port}"
+        def worker_repr(worker_info):
+            return f"http://{worker_info.host}:{worker_info.port}"

Additionally, consider narrowing the broad except to expected exceptions in this loop.

tensorrt_llm/serve/router.py (1)

167-170: Avoid exposing mutable internal state in servers property

Return a copy to prevent external mutation and lock bypass.

 @property
 def servers(self) -> List[str]:
-    return self._servers
+    return list(self._servers)
tests/integration/defs/disaggregated/test_auto_scaling.py (4)

130-146: Add timeouts to requests and remove unnecessary f-string

Avoid indefinite blocking in tests and fix F541.

-        logger.info(f"Waiting for disagg server to be ready")
+        logger.info("Waiting for disagg server to be ready")
         try:
-            info_resp = requests.get(f"http://localhost:{port}/cluster_info")
+            info_resp = requests.get(f"http://localhost:{port}/cluster_info",
+                                     timeout=5)
             if info_resp.status_code == 200:
                 info = info_resp.json()
                 if info["is_ready"]:
                     break
                 logger.info(
                     f"Waiting for disagg server to be ready: {info_resp.json()}"
                 )
             else:
                 logger.info(
                     f"Failed to get cluster info: {info_resp.status_code}")
             await asyncio.sleep(3)
         except requests.exceptions.RequestException as e:
             logger.error(f"Failed to get cluster info: {e}")

149-158: Add timeout to worker health checks

Prevents hangs.

-            info_resp = requests.get(f"http://localhost:{port}/health")
+            info_resp = requests.get(f"http://localhost:{port}/health",
+                                     timeout=5)

165-174: Add timeout to cluster_info verification requests

Prevents hangs.

-    info_resp = requests.get(f"http://localhost:{port}/cluster_info")
+    info_resp = requests.get(f"http://localhost:{port}/cluster_info",
+                             timeout=5)

176-183: Log exceptions on terminate for triage

Don’t silently swallow errors in integration harness.

 def terminate(*args):
     try:
         for arg in args:
             arg.terminate()
             arg.wait()
-    except Exception:
-        pass
+    except Exception as e:
+        logger.warning(f"terminate() encountered an error: {e}")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27677a3 and 459cfcf.

📒 Files selected for processing (8)
  • tensorrt_llm/commands/serve.py (6 hunks)
  • tensorrt_llm/llmapi/disagg_utils.py (5 hunks)
  • tensorrt_llm/serve/cluster_storage.py (1 hunks)
  • tensorrt_llm/serve/openai_disagg_server.py (10 hunks)
  • tensorrt_llm/serve/openai_server.py (5 hunks)
  • tensorrt_llm/serve/router.py (8 hunks)
  • tests/integration/defs/disaggregated/test_auto_scaling.py (1 hunks)
  • tests/integration/test_lists/test-db/l0_h100.yml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tensorrt_llm/serve/cluster_storage.py
  • tensorrt_llm/llmapi/disagg_utils.py
  • tensorrt_llm/serve/openai_server.py
  • tensorrt_llm/commands/serve.py
  • tensorrt_llm/serve/openai_disagg_server.py
  • tensorrt_llm/serve/router.py
  • tests/integration/defs/disaggregated/test_auto_scaling.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tensorrt_llm/serve/cluster_storage.py
  • tensorrt_llm/llmapi/disagg_utils.py
  • tensorrt_llm/serve/openai_server.py
  • tensorrt_llm/commands/serve.py
  • tensorrt_llm/serve/openai_disagg_server.py
  • tensorrt_llm/serve/router.py
  • tests/integration/defs/disaggregated/test_auto_scaling.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tensorrt_llm/serve/cluster_storage.py
  • tensorrt_llm/llmapi/disagg_utils.py
  • tensorrt_llm/serve/openai_server.py
  • tensorrt_llm/commands/serve.py
  • tensorrt_llm/serve/openai_disagg_server.py
  • tensorrt_llm/serve/router.py
  • tests/integration/defs/disaggregated/test_auto_scaling.py
🧬 Code graph analysis (7)
tensorrt_llm/serve/cluster_storage.py (1)
tensorrt_llm/logger.py (1)
  • debug (144-145)
tensorrt_llm/llmapi/disagg_utils.py (1)
tests/unittest/disaggregated/test_disagg_cluster_manager_worker.py (1)
  • config (34-41)
tensorrt_llm/serve/openai_server.py (4)
tensorrt_llm/llmapi/disagg_utils.py (2)
  • DisaggClusterConfig (53-58)
  • ServerRole (19-22)
tensorrt_llm/commands/serve.py (1)
  • serve (340-405)
tensorrt_llm/serve/cluster_storage.py (1)
  • create_cluster_storage_client (112-115)
tensorrt_llm/serve/disagg_auto_scaling.py (2)
  • DisaggClusterWorker (189-301)
  • register_worker (229-264)
tensorrt_llm/commands/serve.py (1)
tensorrt_llm/llmapi/disagg_utils.py (4)
  • DisaggClusterConfig (53-58)
  • MetadataServerConfig (75-80)
  • ServerRole (19-22)
  • extract_cluster_config (243-262)
tensorrt_llm/serve/openai_disagg_server.py (5)
tensorrt_llm/llmapi/disagg_utils.py (1)
  • ServerRole (19-22)
tensorrt_llm/commands/serve.py (1)
  • serve (340-405)
tensorrt_llm/serve/cluster_storage.py (4)
  • WatchEventType (26-28)
  • create_cluster_storage (106-109)
  • start (63-64)
  • start (158-161)
tensorrt_llm/serve/disagg_auto_scaling.py (6)
  • DisaggClusterManager (32-186)
  • watch_workers (90-104)
  • is_ready_with_router (184-186)
  • worker_info (218-222)
  • get_worker_events (110-125)
  • worker_id (214-215)
tensorrt_llm/serve/router.py (10)
  • KvCacheAwareRouter (580-707)
  • servers (168-169)
  • add_server (172-176)
  • add_server (444-454)
  • add_server (510-520)
  • add_server (628-638)
  • remove_server (179-183)
  • remove_server (456-467)
  • remove_server (522-533)
  • remove_server (642-650)
tensorrt_llm/serve/router.py (1)
tensorrt_llm/logger.py (2)
  • warning (132-133)
  • debug (144-145)
tests/integration/defs/disaggregated/test_auto_scaling.py (2)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (2)
  • env (54-61)
  • disagg_server (162-179)
tensorrt_llm/logger.py (1)
  • error (126-127)
🪛 Ruff (0.13.3)
tensorrt_llm/llmapi/disagg_utils.py

250-251: Avoid specifying long messages outside the exception class

(TRY003)

tensorrt_llm/commands/serve.py

402-403: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


402-403: Avoid specifying long messages outside the exception class

(TRY003)

tensorrt_llm/serve/openai_disagg_server.py

95-95: Avoid specifying long messages outside the exception class

(TRY003)


98-98: Avoid specifying long messages outside the exception class

(TRY003)


102-102: Avoid specifying long messages outside the exception class

(TRY003)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


621-621: Do not assign a lambda expression, use a def

Rewrite worker_repr as a def

(E731)


636-636: Do not catch blind exception: Exception

(BLE001)

tensorrt_llm/serve/router.py

450-450: Consider [*old_servers.copy(), server] instead of concatenation

Replace with [*old_servers.copy(), server]

(RUF005)


517-517: Consider [*old_servers.copy(), server] instead of concatenation

Replace with [*old_servers.copy(), server]

(RUF005)


740-741: Avoid specifying long messages outside the exception class

(TRY003)

tests/integration/defs/disaggregated/test_auto_scaling.py

99-99: subprocess call: check for execution of untrusted input

(S603)


117-117: Probable insecure usage of temporary file or directory: "/tmp/disagg_server_"

(S108)


123-123: subprocess call: check for execution of untrusted input

(S603)


130-130: f-string without any placeholders

Remove extraneous f prefix

(F541)


132-132: Probable use of requests call without timeout

(S113)


153-153: Probable use of requests call without timeout

(S113)


165-165: Probable use of requests call without timeout

(S113)


181-182: try-except-pass detected, consider logging the exception

(S110)


181-181: Do not catch blind exception: Exception

(BLE001)


198-198: Unused function argument: router

(ARG001)


220-220: Unused function argument: router

(ARG001)


237-237: Do not assert blind exception: Exception

(B017)


260-260: Unused function argument: router

(ARG001)


276-276: Do not assert blind exception: Exception

(B017)


296-296: Do not assert blind exception: Exception

(B017)


332-332: Unused function argument: router

(ARG001)


349-349: Do not assert blind exception: Exception

(B017)

🔇 Additional comments (2)
tensorrt_llm/serve/cluster_storage.py (1)

280-283: LGTM: avoid noisy debug log when nothing expires

Conditionally logging only when deletions occur is appropriate.

tensorrt_llm/llmapi/disagg_utils.py (1)

71-71: LGTM: DisaggServerConfig gains cluster_config

Addition looks consistent with new server plumbing.

@reasonsolo reasonsolo force-pushed the tllm7843_sdimpl_pr branch 2 times, most recently from 688a074 to 2fcd02a Compare October 9, 2025 08:49
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
@reasonsolo
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20873 [ run ] triggered by Bot

Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
@tensorrt-cicd
Copy link
Collaborator

PR_Github #20873 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #15788 completed with status: 'FAILURE'

@reasonsolo
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20956 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20956 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #15850 completed with status: 'FAILURE'

@reasonsolo
Copy link
Collaborator Author

/bot run

@reasonsolo
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21848 [ run ] triggered by Bot. Commit: 7631dba

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21848 [ run ] completed with state FAILURE. Commit: 7631dba
/LLM/main/L0_MergeRequest_PR pipeline #16470 completed with status: 'FAILURE'

@reasonsolo
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21851 [ run ] triggered by Bot. Commit: 7631dba

@reasonsolo
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21857 [ run ] triggered by Bot. Commit: feba150

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21851 [ run ] completed with state ABORTED. Commit: 7631dba
LLM/main/L0_MergeRequest_PR #16473 (Blue Ocean) completed with status: ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21857 [ run ] completed with state SUCCESS. Commit: feba150
/LLM/main/L0_MergeRequest_PR pipeline #16477 completed with status: 'FAILURE'

@reasonsolo
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21885 [ run ] triggered by Bot. Commit: feba150

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21885 [ run ] completed with state SUCCESS. Commit: feba150
/LLM/main/L0_MergeRequest_PR pipeline #16499 completed with status: 'FAILURE'

@reasonsolo
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21977 [ run ] triggered by Bot. Commit: 8f47771

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21977 [ run ] completed with state SUCCESS. Commit: 8f47771
/LLM/main/L0_MergeRequest_PR pipeline #16573 completed with status: 'FAILURE'

@reasonsolo
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22022 [ run ] triggered by Bot. Commit: 8f47771

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22022 [ run ] completed with state SUCCESS. Commit: 8f47771
/LLM/main/L0_MergeRequest_PR pipeline #16603 completed with status: 'SUCCESS'

@reasonsolo reasonsolo merged commit 23d5280 into NVIDIA:main Oct 21, 2025
5 checks passed
@juney-nvidia
Copy link
Collaborator

@reasonsolo Hi Lizhi, do we have any doc or example telling users how to use this feature?

Thanks
June

@reasonsolo
Copy link
Collaborator Author

@reasonsolo Hi Lizhi, do we have any doc or example telling users how to use this feature?

Thanks June

PR for doc and example is ready at #8602

yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Oct 24, 2025
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 1, 2025
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 3, 2025
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 3, 2025
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 3, 2025
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants