Add PLaMo 3 model support#1234
Conversation
|
Hi @angeloskath, sorry for the direct ping. This PR has been ready for review for a few days and currently has no reviewer assigned. The diff is intentionally scoped to native PLaMo 3 model support plus focused model tests:
I also re-ran the focused test locally:
Result: Would you or another maintainer be able to take a look when you have bandwidth? |
|
Hi @angeloskath, sorry for the follow-up ping, and thank you as always for maintaining mlx-lm. Just checking in on this PR. It is still ready for review, and I re-ran the focused PLaMo 3 model tests locally today:
Result: No urgency from my side, but I would really appreciate it if you or another maintainer could take a look when you have a chance. Please let me know if there is anything I should adjust to make the review easier. |
Thank you to the mlx-lm maintainers for building and maintaining this excellent library. It is a pleasure to contribute support for another open model family to the project.
Summary
plamo3model support for conversion and generation.AutoTokenizerpath. The official PLaMo 3 repositories includetokenization_plamo.pyandauto_mapmetadata forPlamo3Tokenizer, so this PR does not vendor a tokenizer implementation into mlx-lm.Tokenizer note
PLaMo 3 tokenizer use requires Hugging Face remote code, so users should pass
--trust-remote-codeor settokenizer_config={"trust_remote_code": True}when loading these checkpoints. The upstream tokenizer/modeling code also has additional runtime dependencies (torchandnumba) that are not added to mlx-lm's core dependencies by this PR. This follows the existing PLaMo 2 behavior, where using the upstream tokenizer can require model-specific remote-code dependencies.About PLaMo 3
PLaMo 3 is a next-generation LLM series developed by Preferred Networks in collaboration with NICT. The official PFN blog describes it as part of an effort to build safe, high-performance Japanese domestic LLMs using large, high-quality datasets with attention to Japanese culture and society.
The blog explains that PLaMo 3 moves away from the Samba-based architecture used in PLaMo 2 and instead combines full attention with sliding-window attention, similar in spirit to Gemma 3. This is intended to reduce inference time and KV-cache memory usage while still allowing full-attention layers to capture relationships between distant tokens. PFN reports pretraining experiments for 2B, 8B, and 31B base models, with data mixed across English, Japanese, code, and multilingual corpora, and has published PLaMo 3 NICT 2B/8B/31B Base checkpoints on Hugging Face.
Reference: https://tech.preferred.jp/ja/blog/plamo_3_8b_31b/
HuggingFace:
Validation
python -m pytest tests/test_models.py -k plamo3 -q