feat: Add DeepSeek R1 and distilled model chat format support by ljluestc · Pull Request #1 · ljluestc/llama-cpp-python

ljluestc · 2026-03-22T05:44:47Z

Summary

Add native chat format support for DeepSeek R1 and its distilled model variants (Qwen-based and Llama-based), including auto-detection from GGUF metadata.

Closes abetlen#1952

Changes

Chat Format (`llama_chat_format.py`)

Add deepseek-r1 chat format with correct special tokens (<｜User｜>, <｜Assistant｜>, <｜begin▁of▁sentence｜>, <｜end▁of▁sentence｜>)
Add deepseek-r1-distill-qwen and deepseek-r1-distill-llama chat format aliases for distilled model variants
Add DEEPSEEK_R1_CHAT_TEMPLATE constant sourced from the official HuggingFace tokenizer config
Handle </think> reasoning content stripping for multi-turn conversations so chain-of-thought tokens don't leak into context
Set added_special=True in ChatFormatterResponse to prevent double BOS token insertion

Auto-Detection (`guess_chat_format_from_gguf_metadata`)

Exact match against the canonical DeepSeek R1 chat template
Heuristic fallback: detect DeepSeek-family models by checking for characteristic <｜User｜> and <｜Assistant｜> tokens in the template

Submodule Update

Update vendor/llama.cpp submodule to latest (3191462) for full DeepSeek R1/V2/V3 architecture support

Version Bump

Bump version to 0.3.17

Tests (`test_llama_chat_format.py`)

Single-turn and multi-turn conversation formatting
System message handling
</think> reasoning content stripping
Distilled model aliases produce identical output
Auto-detection via exact template match and heuristic token detection
Negative cases (no match, missing template)
Added module stubs so chat format tests run without compiling the native library

Testing

pytest tests/test_llama_chat_format.py -v

Co-Authored-By: Oz oz-agent@warp.dev

- Update llama.cpp submodule to latest (b8184) for full DeepSeek R1/V2/V3 architecture support - Add 'deepseek-r1' chat format with correct special tokens (<｜User｜>, <｜Assistant｜>, <｜begin▁of▁sentence｜>, <｜end▁of▁sentence｜>) - Add 'deepseek-r1-distill-qwen' and 'deepseek-r1-distill-llama' chat format aliases for distilled model variants - Add DEEPSEEK_R1_CHAT_TEMPLATE constant from official HuggingFace tokenizer config - Update guess_chat_format_from_gguf_metadata() to auto-detect DeepSeek R1 models via template matching and heuristic token detection - Handle </think> reasoning content stripping for multi-turn conversations - Bump version to 0.3.17 Closes abetlen#1952

The format_deepseek_r1 function already includes the BOS token (<｜begin▁of▁sentence｜>) in the formatted prompt, but was not setting added_special=True in the ChatFormatterResponse. This caused chat_formatter_to_chat_completion_handler to pass add_bos=True to the tokenizer, resulting in a duplicate BOS token. Also adds comprehensive tests for: - Single-turn and multi-turn conversations - System message handling - </think> reasoning content stripping - Distilled model aliases (qwen/llama) - Auto-detection via exact match and heuristic

ljluestc added 2 commits March 1, 2026 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add DeepSeek R1 and distilled model chat format support#1

feat: Add DeepSeek R1 and distilled model chat format support#1
ljluestc wants to merge 2 commits intomainfrom
feat/deepseek-r1-support

ljluestc commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ljluestc commented Mar 22, 2026

Summary

Changes

Chat Format (llama_chat_format.py)

Auto-Detection (guess_chat_format_from_gguf_metadata)

Submodule Update

Version Bump

Tests (test_llama_chat_format.py)

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Chat Format (`llama_chat_format.py`)

Auto-Detection (`guess_chat_format_from_gguf_metadata`)

Tests (`test_llama_chat_format.py`)