BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4150)#4426
BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4150)#4426kimimgo wants to merge 3 commits intounslothai:mainfrom
Conversation
…on_prompt (unslothai#4150) ChatML-style templates (Hermes, Magnum, etc.) saved after LoRA training may end with {% endfor %} and have no {% if add_generation_prompt %} block. The existing _fix_chat_template only handled the case where content ({{ ... }}) followed endfor, not the case where the template simply ended. Add an elif branch that detects ChatML templates (containing <|im_start|>) with an empty suffix after endfor, and appends the standard ChatML generation prompt block. Fixes unslothai#4150
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical runtime error in the tokenizer utility that prevented certain ChatML-style templates from being correctly processed after LoRA training. By enhancing the template fixing logic, it ensures that all valid ChatML templates, even those missing the generation prompt block, are properly formatted, thereby improving the robustness and compatibility of the tokenizer. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in _fix_chat_template() that caused RuntimeError for ChatML-style templates missing the add_generation_prompt block. The fix adds an elif branch to handle templates with an empty suffix after endfor, appending the standard ChatML generation prompt block. The review focuses on the correctness and maintainability of the added code, ensuring it effectively resolves the issue and integrates well with the existing logic.
| elif after_endfor.strip() == "" and "<|im_start|>" in chat_template: | ||
| # GH#4150: ChatML-style templates (Hermes, etc.) that end with | ||
| # {% endfor %} and have no add_generation_prompt block. | ||
| # Append the standard ChatML generation prompt. | ||
| generation_block = ( | ||
| "{%" + dash + " if add_generation_prompt %}" | ||
| "{{ '<|im_start|>assistant\n' }}" | ||
| "{%" + dash + " endif %}" | ||
| ) | ||
| chat_template = chat_template[: where + len(chosen_end)] + generation_block | ||
|
|
||
| return chat_template |
There was a problem hiding this comment.
This elif block appends a generation prompt to ChatML templates that have an empty suffix after endfor. It's important to ensure that this addition doesn't inadvertently affect other types of templates or introduce unintended side effects. Consider adding a more specific check to ensure this logic only applies to the intended ChatML templates to avoid potential issues in the future.
It is also important to ensure that the appended generation_block is correctly formatted and compatible with all ChatML-style templates.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48119ee39c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
unsloth/tokenizer_utils.py
Outdated
| generation_block = ( | ||
| "{%" + dash + " if add_generation_prompt %}" | ||
| "{{ '<|im_start|>assistant\n' }}" |
There was a problem hiding this comment.
Don't hard-code
<|im_start|>assistant\n for all ChatML fixes
This new branch assumes every <|im_start|> template should generate with '<|im_start|>assistant\n', but we already ship supported variants that use a different assistant prefix. For example, unsloth/chat_templates.py:614-626 defines Phi-4 with '<|im_start|>assistant<|im_sep|>'; if a saved Phi-4 tokenizer loses its add_generation_prompt block and hits _fix_chat_template(), this patch rewrites it to the wrong protocol and produces malformed prompts at inference time. The fix needs to preserve the model-specific assistant prefix instead of treating all ChatML-style templates as Hermes-style newline prompts.
Useful? React with 👍 / 👎.
…t prefix Address review feedback: Phi-4 uses <|im_sep|> instead of newline after role. Extract the separator from the template pattern (message['role'] + 'SEPARATOR') to generate the correct assistant prefix for each ChatML variant.
48119ee to
56764e6
Compare
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e84d889eef
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| sep_match = re.search(r"message\['role'\]\s*\+\s*'([^']*)'", chat_template) | ||
| separator = sep_match.group(1) if sep_match else "\\n" |
There was a problem hiding this comment.
Handle hard-coded ChatML role branches when inferring separator
Fresh evidence: unsloth/chat_templates.py:616-625 defines phi4_template with separate system/user/assistant branches, so the new regex here never matches. In that supported case, a saved Phi-4 tokenizer that is missing its trailing add_generation_prompt block still enters this branch, falls back to "\\n", and gets patched to '<|im_start|>assistant\n' instead of the required '<|im_start|>assistant<|im_sep|>', so inference prompts remain malformed for Phi-4-style tokenizers.
Useful? React with 👍 / 👎.
Fixes #4150
Problem
ChatML-style templates (Hermes-3, Magnum-v2, etc.) saved after LoRA training may end
with
{% endfor %}and have no{% if add_generation_prompt %}block. This causesfix_chat_template()to raise:Root Cause
_fix_chat_template()only patches templates where content ({{ ... }}) follows{% endfor %}. When the template simply ends after{% endfor %}(empty suffix),the function returns the template unchanged, and the caller raises RuntimeError.
Fix
Add an
elifbranch in_fix_chat_template()that detects ChatML templates(containing
<|im_start|>) with an empty suffix afterendfor, and appendsthe standard ChatML generation prompt block:
This is a general fix (not model-name-based bypass) that handles any ChatML-style
template missing the generation prompt block.