Add PLaMo 3 model support by mitmul · Pull Request #1234 · ml-explore/mlx-lm

mitmul · 2026-04-30T22:59:07Z

Thank you to the mlx-lm maintainers for building and maintaining this excellent library. It is a pleasure to contribute support for another open model family to the project.

Summary

Add native plamo3 model support for conversion and generation.
Implement the PLaMo 3 decoder architecture with interleaved full attention and sliding-window attention, q/k RMSNorm, offset RMSNorm residual scaling, packed qkv projection, packed gate/up MLP projection, tied embeddings, and matching cache behavior.
Keep tokenizer loading on the standard Hugging Face AutoTokenizer path. The official PLaMo 3 repositories include tokenization_plamo.py and auto_map metadata for Plamo3Tokenizer, so this PR does not vendor a tokenizer implementation into mlx-lm.
Add focused model tests.

Tokenizer note

PLaMo 3 tokenizer use requires Hugging Face remote code, so users should pass --trust-remote-code or set tokenizer_config={"trust_remote_code": True} when loading these checkpoints. The upstream tokenizer/modeling code also has additional runtime dependencies (torch and numba) that are not added to mlx-lm's core dependencies by this PR. This follows the existing PLaMo 2 behavior, where using the upstream tokenizer can require model-specific remote-code dependencies.

About PLaMo 3

PLaMo 3 is a next-generation LLM series developed by Preferred Networks in collaboration with NICT. The official PFN blog describes it as part of an effort to build safe, high-performance Japanese domestic LLMs using large, high-quality datasets with attention to Japanese culture and society.

The blog explains that PLaMo 3 moves away from the Samba-based architecture used in PLaMo 2 and instead combines full attention with sliding-window attention, similar in spirit to Gemma 3. This is intended to reduce inference time and KV-cache memory usage while still allowing full-attention layers to capture relationships between distant tokens. PFN reports pretraining experiments for 2B, 8B, and 31B base models, with data mixed across English, Japanese, code, and multilingual corpora, and has published PLaMo 3 NICT 2B/8B/31B Base checkpoints on Hugging Face.

Reference: https://tech.preferred.jp/ja/blog/plamo_3_8b_31b/
HuggingFace:

Validation

python -m pytest tests/test_models.py -k plamo3 -q

mitmul · 2026-05-07T00:36:28Z

Hi @angeloskath, sorry for the direct ping.

This PR has been ready for review for a few days and currently has no reviewer assigned. The diff is intentionally scoped to native PLaMo 3 model support plus focused model tests:

adds mlx_lm/models/plamo3.py
adds PLaMo 3 coverage in tests/test_models.py
keeps tokenizer loading on the standard Hugging Face AutoTokenizer / remote-code path, so this does not vendor a tokenizer or add core dependencies

I also re-ran the focused test locally:

python -m pytest tests/test_models.py -k plamo3 -q

Result: 2 passed, 76 deselected.

Would you or another maintainer be able to take a look when you have bandwidth?

mitmul · 2026-05-16T05:13:29Z

Hi @angeloskath, sorry for the follow-up ping, and thank you as always for maintaining mlx-lm.

Just checking in on this PR. It is still ready for review, and I re-ran the focused PLaMo 3 model tests locally today:

python -m pytest tests/test_models.py -k plamo3 -q

Result: 2 passed, 76 deselected.

No urgency from my side, but I would really appreciate it if you or another maintainer could take a look when you have a chance. Please let me know if there is anything I should adjust to make the review easier.

mitmul added 3 commits May 1, 2026 07:58

Add PLaMo 3 support

2678d22

Optimize PLaMo3 full attention cache RoPE

d2e70cc

Remove vendored PLaMo3 tokenizer

1e18d20

mitmul marked this pull request as ready for review May 1, 2026 00:32

mitmul added 4 commits May 1, 2026 10:59

Handle PLaMo3 quantized KV cache RoPE

ec5c74a

Move PLaMo3 KV quantization handling into caches

8f5cbfc

Move PLaMo3 cache tests into model tests

d054855

Move PLaMo3 cache logic into model

093cce5

mitmul marked this pull request as draft May 1, 2026 02:42

mitmul added 4 commits May 1, 2026 11:47

Cache unrotated PLaMo3 keys for KV quantization

8384f2e

Inline PLaMo3 cache quantization helpers

9ca35c9

Unify PLaMo3 key cache RoPE policy

6697d94

Cache RoPE-applied PLaMo3 keys

c3524e8

mitmul marked this pull request as ready for review May 1, 2026 04:12

Optimize PLaMo3 norm path

be0e423

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PLaMo 3 model support#1234

Add PLaMo 3 model support#1234
mitmul wants to merge 12 commits into
ml-explore:mainfrom
mitmul:mitmul/plamo3-support

mitmul commented Apr 30, 2026 •

edited

Loading

Uh oh!

mitmul commented May 7, 2026

Uh oh!

mitmul commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mitmul commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tokenizer note

About PLaMo 3

Validation

Uh oh!

mitmul commented May 7, 2026

Uh oh!

mitmul commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mitmul commented Apr 30, 2026 •

edited

Loading