Skip to content

Conversation

@lancelly
Copy link
Collaborator

@lancelly lancelly commented Aug 6, 2025

This pull request introduces comprehensive documentation and an example script for tool calling with the Kimi-K2 model, focusing on TensorRT-LLM deployments. The changes provide practical guidance and code for parsing and handling tool-call requests, including output format handling and manual parsing when guided decoding is unavailable.

Summary by CodeRabbit

  • New Features

    • Added comprehensive documentation for the Kimi-K2-Instruct model, including setup instructions and detailed guidance on using its tool calling capabilities.
    • Introduced a new example script demonstrating how to interact with the model for tool calling, including weather query functionality and output parsing.
  • Documentation

    • Provided step-by-step usage examples, command-line instructions, and troubleshooting tips for tool calling with Kimi-K2.

@lancelly lancelly requested a review from a team as a code owner August 6, 2025 13:23
@lancelly lancelly requested review from kaiyux and kevinch-nv August 6, 2025 13:23
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 6, 2025

📝 Walkthrough

Walkthrough

A new README and example Python script have been added for the Kimi-K2-Instruct model. The README details the model's tool calling capabilities and usage instructions, while the script demonstrates how to interact with the model for tool (API) calling, including parsing model outputs and invoking local functions.

Changes

Cohort / File(s) Change Summary
Documentation for Kimi-K2 Tool Calling
examples/models/core/kimi_k2/README.md
Added a comprehensive README describing Kimi-K2's tool calling features, usage instructions, and example workflows.
Tool Calling Example Script
examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
Added a Python script demonstrating tool call interaction, output parsing, and local tool execution for Kimi-K2.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ExampleScript
    participant KimiK2Model
    participant LocalTool

    User->>ExampleScript: Provide prompt and tool specs
    ExampleScript->>KimiK2Model: Send prompt (with tool info)
    KimiK2Model-->>ExampleScript: Generate tool call request
    ExampleScript->>ExampleScript: Parse tool call from output
    ExampleScript->>LocalTool: Invoke tool with parsed arguments
    LocalTool-->>ExampleScript: Return tool result
    ExampleScript->>User: Output tool call and result
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested labels

Community want to contribute

Suggested reviewers

  • nv-guomingz
  • QiJune

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@nv-guomingz
Copy link
Collaborator

Could we avoid the merge commit in the history?
image

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🔭 Outside diff range comments (1)
examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py (1)

160-171: Add error handling for unknown tools.

The tool execution loop could fail with a KeyError if an unknown tool name is returned by the model.

Add error handling:

     for tool_call in tool_calls:
         tool_name = tool_call['function']['name']
+        if tool_name not in tool_map:
+            print(f"[Error]: Unknown tool '{tool_name}' requested")
+            continue
+            
         if args.specify_output_format:
             tool_arguments = tool_call['function']['arguments']
         else:
             tool_arguments = json.loads(tool_call['function']['arguments'])
         tool_function = tool_map[tool_name]
🧹 Nitpick comments (8)
examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py (8)

8-18: Consider breaking long prompt strings for better readability.

The multi-line prompt constants exceed the 120-character line limit. While the content is important, consider breaking them into shorter lines for better maintainability.

Here's a suggestion for the first prompt:

-SPECIFY_OUTPUT_FORMAT_PROMPT = """You are an AI assistant with the role name "assistant." Based on the provided API specifications and conversation history from steps 1 to t, generate the API requests that the assistant should call in step t+1. The API requests should be output in the format [api_name(key1='value1', key2='value2', ...)], replacing api_name with the actual API name, key1, key2, etc., with the actual parameter names, and value1, value2, etc., with the actual parameter values. The output should start with a square bracket "[" and end with a square bracket "]".
+SPECIFY_OUTPUT_FORMAT_PROMPT = """You are an AI assistant with the role name "assistant." Based on the provided API specifications and conversation history from steps 1 to t, generate the API requests that the assistant should call in step t+1. The API requests should be output in the format [api_name(key1='value1', key2='value2', ...)], replacing api_name with the actual API name, key1, key2, etc., with the actual parameter names, and value1, value2, etc., with the actual parameter values. The output should start with a square bracket "[" and end with a square bracket "]".

Similar treatment can be applied to the second prompt and other long lines.


116-117: Fix formatting issue in system prompt construction.

Line 116 has a formatting issue that makes it hard to read.

-    system_prompt = SPECIFY_OUTPUT_FORMAT_PROMPT if args.specify_output_format else NOT_SPECIFY_OUTPUT_FORMAT_PROMPT.format(
-        tools=tools)
+    if args.specify_output_format:
+        system_prompt = SPECIFY_OUTPUT_FORMAT_PROMPT
+    else:
+        system_prompt = NOT_SPECIFY_OUTPUT_FORMAT_PROMPT.format(tools=tools)

8-22: Fix line length violations in string constants.

Multiple lines exceed the 120-character limit. Consider breaking long strings into multiple lines for better readability.

Apply this diff to fix line length violations:

-SPECIFY_OUTPUT_FORMAT_PROMPT = """You are an AI assistant with the role name "assistant." Based on the provided API specifications and conversation history from steps 1 to t, generate the API requests that the assistant should call in step t+1. The API requests should be output in the format [api_name(key1='value1', key2='value2', ...)], replacing api_name with the actual API name, key1, key2, etc., with the actual parameter names, and value1, value2, etc., with the actual parameter values. The output should start with a square bracket "[" and end with a square bracket "]".
-If there are multiple API requests, separate them with commas, for example: [api_name(key1='value1', key2='value2', ...), api_name(key1='value1', key2='value2', ...), ...]. Do not include any other explanations, prompts, or API call results in the output.
-If the API parameter description does not specify otherwise, the parameter is optional (parameters mentioned in the user input need to be included in the output; if not mentioned, they do not need to be included).
-If the API parameter description does not specify the required format for the value, use the user's original text for the parameter value.
-If the API requires no parameters, output the API request directly in the format [api_name()], and do not invent any nonexistent parameter names.
+SPECIFY_OUTPUT_FORMAT_PROMPT = """You are an AI assistant with the role name "assistant." Based on the provided API \
+specifications and conversation history from steps 1 to t, generate the API requests that the assistant should call in \
+step t+1. The API requests should be output in the format [api_name(key1='value1', key2='value2', ...)], replacing \
+api_name with the actual API name, key1, key2, etc., with the actual parameter names, and value1, value2, etc., with \
+the actual parameter values. The output should start with a square bracket "[" and end with a square bracket "]".
+If there are multiple API requests, separate them with commas, for example: \
+[api_name(key1='value1', key2='value2', ...), api_name(key1='value1', key2='value2', ...), ...]. \
+Do not include any other explanations, prompts, or API call results in the output.
+If the API parameter description does not specify otherwise, the parameter is optional (parameters mentioned in the \
+user input need to be included in the output; if not mentioned, they do not need to be included).
+If the API parameter description does not specify the required format for the value, use the user's original text for \
+the parameter value.
+If the API requires no parameters, output the API request directly in the format [api_name()], and do not invent any \
+nonexistent parameter names.

-NOT_SPECIFY_OUTPUT_FORMAT_PROMPT = """Important: Only give the tool call requests, do not include any other explanations, prompts, or API call results in the output.
-The tool call requests generated by you are wrapped by <|tool_calls_section_begin|> and <|tool_calls_section_end|>, with each tool call wrapped by <|tool_call_begin|> and <|tool_call_end|>. The tool ID and arguments are separated by <|tool_call_argument_begin|>. The format of the tool ID is functions.func_name:idx, from which we can parse the function name.
+NOT_SPECIFY_OUTPUT_FORMAT_PROMPT = """Important: Only give the tool call requests, do not include any other \
+explanations, prompts, or API call results in the output.
+The tool call requests generated by you are wrapped by <|tool_calls_section_begin|> and \
+<|tool_calls_section_end|>, with each tool call wrapped by <|tool_call_begin|> and <|tool_call_end|>. \
+The tool ID and arguments are separated by <|tool_call_argument_begin|>. The format of the tool ID is \
+functions.func_name:idx, from which we can parse the function name.

37-62: Fix line length violation in regex pattern.

The function logic is correct and properly handles the parsing of tool call information from the model output.

Fix the line length violation on Line 47:

-    func_call_pattern = r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[\w\.]+:\d+)\s*<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*?)\s*<\|tool_call_end\|>"
+    func_call_pattern = (
+        r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[\w\.]+:\d+)\s*"
+        r"<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*?)\s*<\|tool_call_end\|>"
+    )

113-137: Fix line length violation and approve the API logic.

The function correctly handles API communication and tool call parsing.

Fix the line length violation on Line 116:

-    system_prompt = SPECIFY_OUTPUT_FORMAT_PROMPT if args.specify_output_format else NOT_SPECIFY_OUTPUT_FORMAT_PROMPT.format(
-        tools=tools)
+    system_prompt = (
+        SPECIFY_OUTPUT_FORMAT_PROMPT 
+        if args.specify_output_format 
+        else NOT_SPECIFY_OUTPUT_FORMAT_PROMPT.format(tools=tools)
+    )

37-63: Fix line length violation in regex pattern.

The function logic is solid and correctly parses the K2 tool call format. However, line 47 exceeds the 120-character limit.

-    func_call_pattern = r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[\w\.]+:\d+)\s*<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*?)\s*<\|tool_call_end\|>"
+    func_call_pattern = (
+        r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[\w\.]+:\d+)\s*"
+        r"<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*?)\s*<\|tool_call_end\|>"
+    )

65-89: Consider more specific exception handling.

The function correctly parses the specified format and uses ast.literal_eval for safe argument evaluation. However, the broad except Exception could be more specific.

         try:
             kwargs[k] = ast.literal_eval(v.strip())
-        except Exception:
+        except (ValueError, SyntaxError):
             kwargs[k] = v.strip()

113-138: Fix line length violation and improve readability.

The function correctly handles both output formats and makes appropriate API calls. However, line 116 exceeds the 120-character limit.

-    system_prompt = SPECIFY_OUTPUT_FORMAT_PROMPT if args.specify_output_format else NOT_SPECIFY_OUTPUT_FORMAT_PROMPT.format(
-        tools=tools)
+    if args.specify_output_format:
+        system_prompt = SPECIFY_OUTPUT_FORMAT_PROMPT
+    else:
+        system_prompt = NOT_SPECIFY_OUTPUT_FORMAT_PROMPT.format(tools=tools)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13ecb4a and 5041666.

📒 Files selected for processing (2)
  • examples/models/core/kimi_k2/README.md (1 hunks)
  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py (1 hunks)
👮 Files not reviewed due to content moderation or server errors (1)
  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a class in the constructor in Python.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
🧠 Learnings (1)
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • examples/models/core/kimi_k2/README.md
🪛 Ruff (0.12.2)
examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py

8-8: Line too long (580 > 120)

(E501)


9-9: Line too long (255 > 120)

(E501)


10-10: Line too long (213 > 120)

(E501)


11-11: Line too long (138 > 120)

(E501)


12-12: Line too long (145 > 120)

(E501)


17-17: Line too long (165 > 120)

(E501)


18-18: Line too long (359 > 120)

(E501)


47-47: Line too long (165 > 120)

(E501)


116-116: Line too long (124 > 120)

(E501)

🔇 Additional comments (27)
examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py (13)

24-35: LGTM! Clean tool implementation and mapping.

The get_weather function provides a good example for demonstration purposes, and the tool_map dictionary provides a clean mechanism for dynamic function lookup.


37-63: LGTM! Robust parsing with good documentation reference.

The extract_tool_call_info function handles the K2-specific delimiter format correctly, with proper regex parsing and structured output. The reference to the HuggingFace documentation is helpful.


65-89: Good use of ast.literal_eval for safe argument parsing.

The function safely parses arguments using ast.literal_eval with appropriate fallback to string parsing. The regex patterns correctly handle the function call format.


91-111: LGTM! Proper tool schema definition.

The function correctly defines the tool schema in OpenAI-compatible format with proper parameter specifications and type definitions.


113-138: LGTM! Well-structured request handling with good debugging output.

The function properly constructs chat completion requests and handles both output formats correctly. The debug output will be valuable for users understanding the tool calling process.


140-158: LGTM! Well-configured argument parsing and client setup.

The argument parsing provides good defaults for testing, and the OpenAI client configuration is appropriate for the local TensorRT-LLM server setup described in the README.


24-34: LGTM!

The get_weather function and tool mapping follow proper naming conventions and provide a clean implementation for the demonstration.


65-88: LGTM!

Excellent implementation with proper regex parsing, safe argument evaluation using ast.literal_eval, and robust exception handling.


91-110: LGTM!

The tool definitions follow the correct schema format and include all required fields for proper function calling integration.


158-171: LGTM!

The tool execution logic properly handles both output formats and correctly calls the mapped functions with parsed arguments.


24-35: LGTM! Clean tool function implementation.

The weather function follows proper naming conventions and includes case-insensitive location handling. The tool mapping dictionary provides a clean approach for dynamic function calling.


91-111: LGTM! Well-structured tool definitions.

The function returns properly formatted tool definitions that follow the OpenAI standard schema with clear descriptions and parameter specifications.


140-171: LGTM! Well-structured main execution flow.

The main logic correctly implements the tool calling workflow with proper argument parsing, client setup, and tool execution. The conditional handling of different argument formats between parsing modes is appropriate.

examples/models/core/kimi_k2/README.md (14)

1-21: LGTM! Clear and comprehensive overview.

The overview provides excellent context about the K2 model's capabilities, and the tool calling process steps are clearly explained and align with the implementation in the example script.


44-51: LGTM! Accurate explanation of tool calling approaches.

The section correctly explains both approaches for tool calling and accurately notes the TensorRT-LLM limitation, which aligns with the manual parsing implementation in the example script.


52-98: LGTM! Excellent practical examples and clear workflow.

The example workflow provides clear step-by-step instructions with realistic command-line examples and expected outputs that align perfectly with the companion Python script.


99-120: LGTM! Important warnings and technical context provided.

The second example demonstrates the formatted output mode well, and the warning about output format deviations provides crucial context for users working with TensorRT-LLM deployments.


1-6: LGTM!

Clear and informative overview that provides essential context about the Kimi K2 model's capabilities and architecture.


7-21: LGTM!

The prerequisites section clearly outlines the tool calling process and aligns perfectly with the implementation in the example script.


44-52: LGTM!

Excellent explanation of the two approaches and the important limitation note about TensorRT-LLM's current capabilities.


54-98: LGTM!

The example usage section provides clear, accurate commands and realistic expected outputs that align perfectly with the example script's implementation.


99-120: LGTM!

Important warnings about output format deviation and parsing challenges provide crucial context for users working with TensorRT-LLM deployments.


1-6: LGTM! Clear and informative overview.

The overview effectively introduces the Kimi-K2 model with key specifications and capabilities, providing good context for users.


44-53: LGTM! Clear explanation of tool calling approaches.

This section effectively communicates the available approaches and the important limitation that TensorRT-LLM requires manual parsing. The information is crucial for users to understand the workflow.


54-67: LGTM! Standard server deployment instructions.

The server launch instructions follow the expected TensorRT-LLM deployment pattern with appropriate configuration options.


68-98: LGTM! Comprehensive and practical examples.

The usage examples provide clear command-line instructions with expected outputs, effectively demonstrating the tool calling workflow. The step-by-step explanations help users understand the process.


99-120: LGTM! Important warnings about format limitations.

The second example demonstrates the alternative output format effectively, and the note about potential format deviations is crucial information for users deploying with TensorRT-LLM.

Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
@lancelly lancelly force-pushed the example/k2_tool_use branch from 5041666 to 1dd0ab3 Compare August 6, 2025 13:53
@lancelly
Copy link
Collaborator Author

lancelly commented Aug 6, 2025

Could we avoid the merge commit in the history? image

Done.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py (1)

177-177: Fix argparse boolean argument handling.

Using type=bool with argparse doesn't work as expected - it will always be True if any value is provided, even "False".

Apply this diff to fix the boolean argument:

-    parser.add_argument("--specify_output_format", type=bool, default=False)
+    parser.add_argument("--specify_output_format", action="store_true", default=False)
🧹 Nitpick comments (3)
examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py (3)

25-50: Fix remaining line length violations.

Several lines still exceed the 120-character limit despite the line continuation approach. Lines 30, 35, 36, 37, 45, and 46 need additional breaks.

Apply these fixes to resolve the remaining line length violations:

-and value1, value2, etc., with the actual parameter values. The output should start with a square bracket "[" and end with a square bracket "]".
+and value1, value2, etc., with the actual parameter values. \
+The output should start with a square bracket "[" and end with a square bracket "]".
-(parameters mentioned in the user input need to be included in the output; if not mentioned, they do not need to be included).
+(parameters mentioned in the user input need to be included in the output; \
+if not mentioned, they do not need to be included).
-If the API parameter description does not specify the required format for the value, use the user's original text for the parameter value. \
+If the API parameter description does not specify the required format for the value, \
+use the user's original text for the parameter value. \
-If the API requires no parameters, output the API request directly in the format [api_name()], and do not invent any nonexistent parameter names.
+If the API requires no parameters, output the API request directly in the format [api_name()], \
+and do not invent any nonexistent parameter names.
-<|tool_calls_section_begin|> and <|tool_calls_section_end|>, with each tool call wrapped by <|tool_call_begin|> and <|tool_call_end|>. \
+<|tool_calls_section_begin|> and <|tool_calls_section_end|>, \
+with each tool call wrapped by <|tool_call_begin|> and <|tool_call_end|>. \
-The tool ID and arguments are separated by <|tool_call_argument_begin|>. The format of the tool ID is functions.func_name:idx, \
+The tool ID and arguments are separated by <|tool_call_argument_begin|>. \
+The format of the tool ID is functions.func_name:idx, \

76-76: Fix line length violation.

Line 76 exceeds the 120-character limit.

Apply this fix:

-    func_call_pattern = r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[\w\.]+:\d+)\s*<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*?)\s*<\|tool_call_end\|>"
+    func_call_pattern = (r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[\w\.]+:\d+)\s*"
+                        r"<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*?)\s*"
+                        r"<\|tool_call_end\|>")

145-146: Fix line length violation.

Line 145 exceeds the 120-character limit.

Apply this fix:

-    system_prompt = SPECIFY_OUTPUT_FORMAT_PROMPT if args.specify_output_format else NOT_SPECIFY_OUTPUT_FORMAT_PROMPT.format(
-        tools=tools)
+    system_prompt = (SPECIFY_OUTPUT_FORMAT_PROMPT if args.specify_output_format 
+                    else NOT_SPECIFY_OUTPUT_FORMAT_PROMPT.format(tools=tools))
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5041666 and 1dd0ab3.

📒 Files selected for processing (2)
  • examples/models/core/kimi_k2/README.md (1 hunks)
  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • examples/models/core/kimi_k2/README.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a class in the constructor in Python.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
🧠 Learnings (4)
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T08:45:40.701Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Applied to files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
📚 Learning: applies to **/*.py : the code developed for tensorrt-llm should conform to python 3.8+....
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T08:45:40.701Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

Applied to files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T08:45:40.701Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Applied to files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
🪛 Ruff (0.12.2)
examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py

30-30: Line too long (144 > 120)

(E501)


35-35: Line too long (126 > 120)

(E501)


36-36: Line too long (140 > 120)

(E501)


37-37: Line too long (145 > 120)

(E501)


45-45: Line too long (136 > 120)

(E501)


46-46: Line too long (128 > 120)

(E501)


76-76: Line too long (165 > 120)

(E501)


145-145: Line too long (124 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (8)
examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py (8)

1-17: LGTM! Copyright header is correctly included.

The required NVIDIA copyright header with the current year is properly included at the top of the file, complying with the coding guidelines.


18-23: LGTM! Imports follow Python conventions.

The imports maintain proper namespace usage and follow the coding guidelines for TensorRT-LLM.


53-63: LGTM! Tool function and mapping are well-implemented.

The weather function provides a good example with simple hardcoded responses, and the tool mapping enables dynamic function calls as intended.


67-91: LGTM! Tool call extraction logic is well-implemented.

The function correctly parses the custom delimiter format and handles multiple tool calls appropriately. The regex patterns and parsing logic are sound.


94-117: LGTM! Specified format parsing is robust.

The function correctly parses function call syntax using regex and safely evaluates arguments with AST literal evaluation, including proper fallback error handling.


120-139: LGTM! Tool specification follows OpenAI format correctly.

The tool specification is properly structured with all required fields for function calling.


142-166: LGTM! Request orchestration is well-implemented.

The function correctly handles both output formats, constructs appropriate messages, and parses responses using the right parsing function based on the format.


169-176: LGTM! Main execution logic is well-structured.

The argument parsing, client setup, and tool execution flow are properly implemented. The script correctly handles both output formats and provides clear output for debugging.

Also applies to: 178-199

Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py (3)

25-50: Address remaining line length violations in prompt strings.

Several lines still exceed the 120-character limit despite previous improvements. Consider further breaking down the longer lines:

 and value1, value2, etc., with the actual parameter values. The output should start with a square bracket "[" and end with a square bracket "]".
-If there are multiple API requests, separate them with commas, for example: \
+If there are multiple API requests, separate them with commas, for example: \
 [api_name(key1='value1', key2='value2', ...), api_name(key1='value1', key2='value2', ...), ...]. \
-Do not include any other explanations, prompts, or API call results in the output.
-If the API parameter description does not specify otherwise, the parameter is optional \
-(parameters mentioned in the user input need to be included in the output; if not mentioned, they do not need to be included).
-If the API parameter description does not specify the required format for the value, use the user's original text for the parameter value. \
-If the API requires no parameters, output the API request directly in the format [api_name()], and do not invent any nonexistent parameter names.
+Do not include any other explanations, prompts, or API call results in the output.
+If the API parameter description does not specify otherwise, the parameter is optional \
+(parameters mentioned in the user input need to be included in the output; \
+if not mentioned, they do not need to be included).
+If the API parameter description does not specify the required format for the value, \
+use the user's original text for the parameter value.
+If the API requires no parameters, output the API request directly in the format [api_name()], \
+and do not invent any nonexistent parameter names.

76-76: Fix line length violation in regex pattern.

The regex pattern exceeds the 120-character limit.

-    func_call_pattern = r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[\w\.]+:\d+)\s*<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*?)\s*<\|tool_call_end\|>"
+    func_call_pattern = (
+        r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[\w\.]+:\d+)\s*"
+        r"<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*?)\s*<\|tool_call_end\|>"
+    )

145-146: Fix line length violation in system prompt assignment.

The conditional assignment exceeds the line limit.

-    system_prompt = SPECIFY_OUTPUT_FORMAT_PROMPT if args.specify_output_format else NOT_SPECIFY_OUTPUT_FORMAT_PROMPT.format(
-        tools=tools)
+    system_prompt = (
+        SPECIFY_OUTPUT_FORMAT_PROMPT if args.specify_output_format 
+        else NOT_SPECIFY_OUTPUT_FORMAT_PROMPT.format(tools=tools)
+    )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1dd0ab3 and a18ebc4.

📒 Files selected for processing (2)
  • examples/models/core/kimi_k2/README.md (1 hunks)
  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/models/core/kimi_k2/README.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a class in the constructor in Python.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
🧠 Learnings (5)
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : all tensorrt-llm open source software code should contain...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T08:45:40.701Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Applied to files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
📚 Learning: in tensorrt-llm, test files (files under tests/ directories) do not require nvidia copyright headers...
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.495Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
📚 Learning: applies to **/*.py : the code developed for tensorrt-llm should conform to python 3.8+....
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T08:45:40.701Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

Applied to files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
📚 Learning: applies to **/*.{h,hpp} : use a preprocessor guard in header files. the guard name must have prefix ...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T08:45:40.701Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Applied to files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py
🪛 Ruff (0.12.2)
examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py

30-30: Line too long (144 > 120)

(E501)


35-35: Line too long (126 > 120)

(E501)


36-36: Line too long (140 > 120)

(E501)


37-37: Line too long (145 > 120)

(E501)


45-45: Line too long (136 > 120)

(E501)


46-46: Line too long (128 > 120)

(E501)


76-76: Line too long (165 > 120)

(E501)


145-145: Line too long (124 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (8)
examples/models/core/kimi_k2/kimi_k2_tool_calling_example.py (8)

1-17: LGTM! Copyright header properly implemented.

The NVIDIA copyright header is correctly formatted with the current year and Apache 2.0 license, meeting the coding guidelines requirement.


18-23: LGTM! Import organization follows best practices.

The imports are properly organized with standard library imports first, followed by third-party imports, and maintain proper namespace usage as required by the coding guidelines.


53-63: LGTM! Clean example tool implementation.

The get_weather function and tool mapping follow proper Python conventions with snake_case naming, type hints, and clear logic that serves well as a demonstration tool for the example.


67-91: LGTM! Well-structured parsing for K2 model format.

The extract_tool_call_info function correctly handles the custom delimiter format specific to the K2 model with proper regex parsing and structured output.


94-117: LGTM! Robust parsing with proper error handling.

The parse_specified_format_tool_calls function implements solid regex parsing with appropriate error handling for malformed arguments using ast.literal_eval with fallback.


120-139: LGTM! Proper tool schema definition.

The get_tools function correctly defines the tool schema in OpenAI format with appropriate type definitions, required parameters, and clear descriptions that match the actual get_weather implementation.


142-166: LGTM! Well-orchestrated tool calling workflow.

The function properly handles both output formats, makes appropriate API calls, and provides good debugging visibility. The logic flow is clear and correct.


169-201: LGTM! Clean main execution with proper argument handling.

The main block demonstrates proper usage patterns with correct argparse boolean handling (fixed from previous review), appropriate client configuration, and clear result processing for both output formats.

@svc-trtllm-gh-bot svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Aug 6, 2025
@lancelly
Copy link
Collaborator Author

lancelly commented Aug 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14350 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14350 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #10844 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Aug 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14355 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14355 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #10848 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Aug 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14375 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14375 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #10865 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Aug 8, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14536 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14536 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10981 completed with status: 'SUCCESS'

@litaotju
Copy link
Collaborator

litaotju commented Aug 8, 2025

overall LGTM. Added a few comments.

Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
@lancelly
Copy link
Collaborator Author

lancelly commented Aug 8, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14619 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14619 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11044 completed with status: 'SUCCESS'

@litaotju litaotju merged commit a2e9153 into NVIDIA:main Aug 11, 2025
4 checks passed
MartinMarciniszyn added a commit to MartinMarciniszyn/TensorRT-LLM that referenced this pull request Aug 12, 2025
@lancelly lancelly deleted the example/k2_tool_use branch August 15, 2025 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community want to contribute PRs initiated from Community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants