Skip to content

Conversation

@JadoTu
Copy link
Collaborator

@JadoTu JadoTu commented Nov 20, 2025

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced stop-word token handling for the Qwen3 Next model. The system now automatically detects and appends additional end-of-sequence tokens into the stop-word vocabulary during model initialization when available. This improvement ensures more accurate text generation termination and delivers consistent model behavior across different inference scenarios and deployment configurations.

✏️ Tip: You can customize this high-level summary in your review settings.

Description

modify qwen3-next sampling stop_tokens

Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 20, 2025

📝 Walkthrough

Walkthrough

Adds model-specific logic to extend stop-word vocabulary during sampling parameter setup for the qwen3_next model. When generation_config.eos_token_id is a list of integers, new EOS tokens not already present in the stop word list are computed and appended.

Changes

Cohort / File(s) Summary
Model-specific stop-word vocabulary extension
tensorrt_llm/sampling_params.py
Adds conditional logic to extend stop-word vocabulary with EOS tokens from generation_config when model_type is "qwen3_next". Computes token difference between generation_config.eos_token_id and existing stop word ids, then appends new tokens.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify the token comparison logic correctly identifies tokens not already in the stop word list
  • Confirm the conditional check for model_type == "qwen3_next" is the intended scope
  • Ensure the append operation maintains list integrity and doesn't cause unintended side effects

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. It lacks detailed explanation of the issue, solution rationale, test coverage information, and does not follow the repository's description template structure. Add sections for Description (what and why), Test Coverage, and complete the PR Checklist. Include details about the qwen3-next model-specific logic and why this change was necessary.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly specifies the model-specific change (qwen3-next sampling stop_tokens) and the fix type, directly reflecting the main change in the PR.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bdd3ba and cd3dd45.

📒 Files selected for processing (1)
  • tensorrt_llm/sampling_params.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-18T08:42:02.640Z
Learnt from: samuellees
Repo: NVIDIA/TensorRT-LLM PR: 6974
File: tensorrt_llm/serve/scripts/benchmark_dataset.py:558-566
Timestamp: 2025-08-18T08:42:02.640Z
Learning: In TensorRT-LLM's RandomDataset (tensorrt_llm/serve/scripts/benchmark_dataset.py), when using --random-token-ids option, sequence length accuracy is prioritized over semantic correctness for benchmarking purposes. The encode/decode operations should use skip_special_tokens=True and add_special_tokens=False to ensure exact target token lengths.

Applied to files:

  • tensorrt_llm/sampling_params.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 21, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25283 [ run ] triggered by Bot. Commit: cd3dd45

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25283 [ run ] completed with state SUCCESS. Commit: cd3dd45
/LLM/main/L0_MergeRequest_PR pipeline #19128 completed with status: 'FAILURE'

@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 21, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25311 [ run ] triggered by Bot. Commit: cd3dd45

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25311 [ run ] completed with state SUCCESS. Commit: cd3dd45
/LLM/main/L0_MergeRequest_PR pipeline #19148 completed with status: 'FAILURE'

Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
@JadoTu
Copy link
Collaborator Author

JadoTu commented Nov 22, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25417 [ run ] triggered by Bot. Commit: 18925a8

@tensorrt-cicd
Copy link
Collaborator

PR_Github #25417 [ run ] completed with state SUCCESS. Commit: 18925a8
/LLM/main/L0_MergeRequest_PR pipeline #19233 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@nv-guomingz nv-guomingz merged commit 0582e54 into NVIDIA:main Nov 23, 2025
5 checks passed
codego7250 pushed a commit to codego7250/TensorRT-LLM that referenced this pull request Dec 11, 2025
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants