-
Notifications
You must be signed in to change notification settings - Fork 2k
[None][fix] modify qwen3-next sampling stop_tokens #9331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
📝 WalkthroughWalkthroughAdds model-specific logic to extend stop-word vocabulary during sampling parameter setup for the qwen3_next model. When generation_config.eos_token_id is a list of integers, new EOS tokens not already present in the stop word list are computed and appended. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tensorrt_llm/sampling_params.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-18T08:42:02.640Z
Learnt from: samuellees
Repo: NVIDIA/TensorRT-LLM PR: 6974
File: tensorrt_llm/serve/scripts/benchmark_dataset.py:558-566
Timestamp: 2025-08-18T08:42:02.640Z
Learning: In TensorRT-LLM's RandomDataset (tensorrt_llm/serve/scripts/benchmark_dataset.py), when using --random-token-ids option, sequence length accuracy is prioritized over semantic correctness for benchmarking purposes. The encode/decode operations should use skip_special_tokens=True and add_special_tokens=False to ensure exact target token lengths.
Applied to files:
tensorrt_llm/sampling_params.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
|
/bot run |
|
PR_Github #25283 [ run ] triggered by Bot. Commit: |
|
PR_Github #25283 [ run ] completed with state |
|
/bot run |
|
PR_Github #25311 [ run ] triggered by Bot. Commit: |
|
PR_Github #25311 [ run ] completed with state |
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
|
/bot run |
|
PR_Github #25417 [ run ] triggered by Bot. Commit: |
|
PR_Github #25417 [ run ] completed with state |
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.
Description
modify qwen3-next sampling stop_tokens