Skip to content

feat: Add DeepSeek R1 and distilled model chat format support#1

Open
ljluestc wants to merge 2 commits intomainfrom
feat/deepseek-r1-support
Open

feat: Add DeepSeek R1 and distilled model chat format support#1
ljluestc wants to merge 2 commits intomainfrom
feat/deepseek-r1-support

Conversation

@ljluestc
Copy link
Copy Markdown
Owner

Summary

Add native chat format support for DeepSeek R1 and its distilled model variants (Qwen-based and Llama-based), including auto-detection from GGUF metadata.

Closes abetlen#1952

Changes

Chat Format (llama_chat_format.py)

  • Add deepseek-r1 chat format with correct special tokens (<|User|>, <|Assistant|>, <|begin▁of▁sentence|>, <|end▁of▁sentence|>)
  • Add deepseek-r1-distill-qwen and deepseek-r1-distill-llama chat format aliases for distilled model variants
  • Add DEEPSEEK_R1_CHAT_TEMPLATE constant sourced from the official HuggingFace tokenizer config
  • Handle </think> reasoning content stripping for multi-turn conversations so chain-of-thought tokens don't leak into context
  • Set added_special=True in ChatFormatterResponse to prevent double BOS token insertion

Auto-Detection (guess_chat_format_from_gguf_metadata)

  • Exact match against the canonical DeepSeek R1 chat template
  • Heuristic fallback: detect DeepSeek-family models by checking for characteristic <|User|> and <|Assistant|> tokens in the template

Submodule Update

  • Update vendor/llama.cpp submodule to latest (3191462) for full DeepSeek R1/V2/V3 architecture support

Version Bump

  • Bump version to 0.3.17

Tests (test_llama_chat_format.py)

  • Single-turn and multi-turn conversation formatting
  • System message handling
  • </think> reasoning content stripping
  • Distilled model aliases produce identical output
  • Auto-detection via exact template match and heuristic token detection
  • Negative cases (no match, missing template)
  • Added module stubs so chat format tests run without compiling the native library

Testing

pytest tests/test_llama_chat_format.py -v

Co-Authored-By: Oz oz-agent@warp.dev

ljluestc added 2 commits March 1, 2026 12:30
- Update llama.cpp submodule to latest (b8184) for full DeepSeek R1/V2/V3 architecture support
- Add 'deepseek-r1' chat format with correct special tokens (<|User|>, <|Assistant|>, <|begin▁of▁sentence|>, <|end▁of▁sentence|>)
- Add 'deepseek-r1-distill-qwen' and 'deepseek-r1-distill-llama' chat format aliases for distilled model variants
- Add DEEPSEEK_R1_CHAT_TEMPLATE constant from official HuggingFace tokenizer config
- Update guess_chat_format_from_gguf_metadata() to auto-detect DeepSeek R1 models via template matching and heuristic token detection
- Handle </think> reasoning content stripping for multi-turn conversations
- Bump version to 0.3.17

Closes abetlen#1952
The format_deepseek_r1 function already includes the BOS token
(<|begin▁of▁sentence|>) in the formatted prompt, but was not setting
added_special=True in the ChatFormatterResponse. This caused
chat_formatter_to_chat_completion_handler to pass add_bos=True to the
tokenizer, resulting in a duplicate BOS token.

Also adds comprehensive tests for:
- Single-turn and multi-turn conversations
- System message handling
- </think> reasoning content stripping
- Distilled model aliases (qwen/llama)
- Auto-detection via exact match and heuristic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

更新llama cpp,目前不支持deepseek r1以及蒸馏模型

1 participant