feat: add diffusers format support for FLUX.2-klein by ComputerPers · Pull Request #1369 · leejet/stable-diffusion.cpp

ComputerPers · 2026-03-27T10:11:52Z

This adds support for loading FLUX.2-klein models in HuggingFace diffusers format — users can load models directly from ~/.cache/huggingface/hub/ without converting to GGUF or BFL first.

Also fixes #699 — the split-fuse qkv fix is generic and resolves diffusers format loading for SD3, FLUX.1 and Z-Image as well.

Along the way I fixed two issues that affect all diffusers-format models, not just FLUX.2:

Split-fuse qkv loading. The .weight.N suffix convention in name_conversion.cpp was already used for SD3, FLUX.1 and Z-Image to split q/k/v into separate tensors, but load_tensors() never handled the split parts — they were silently dropped as "unknown tensor" and the base tensor had a shape mismatch. This PR adds dst_offset support so split parts get written into the correct positions of the fused destination tensor. This fixes diffusers loading for all models that use split qkv, not just FLUX.2.

Multi-shard safetensors. Large HF models are often sharded into multiple .safetensors files with a model.safetensors.index.json index. Added init_from_safetensors_index() that reads the index and loads each shard. This enables loading sharded text encoders (e.g. FLUX.2-klein-4B Qwen3 TE is 2 shards, FLUX.1-dev T5 is 2 shards).

Other changes:

init_from_diffusers_file() now tries transformer/ before unet/ for DiT models
Reads model_index.json to auto-detect LLM text encoders (Qwen, Llama) and use the correct prefix
FLUX.2 diffusers adaLN modulation stores [shift, scale] vs BFL [scale, shift] — halves are swapped at load time

Before / After

Without this PR, loading FLUX.2-klein-4B diffusers format fails:

# Sharded text encoder:
WARN  file text_encoder/model.safetensors not found
ERROR load tensors from model loader failed

# Diffusers transformer:
ERROR get sd version from file failed

With this PR, both work:

# Sharded text encoder:
INFO  loaded 2 shards from 'text_encoder/model.safetensors.index.json'

# Diffusers transformer:
INFO  Version: Flux.2 klein
INFO  generate_image completed in 159.85s
INFO  save result image (success)

Testing

Tested on Apple Silicon (M3 Max, 36 GB):

FLUX.2-klein-4B diffusers: all 418 tensors loaded, image generated, pixel-identical to BFL at same seed
FLUX.2-klein-4B directory loading (--diffusion-model <dir>/): auto-detected Qwen3 TE, generated OK
Multi-shard TE: FLUX.2-klein-4B (2 shards) and FLUX.1-dev T5 (2 shards) loaded
Comparison at 256x256, 512x512, 1024x1024: diffusers vs BFL identical
Regression: FLUX.1-dev GGUF Q8_0 (780 tensors, full generation OK), FLUX.2-klein-9B GGUF Q8_0 (201 tensors OK)
41 unit tests for name conversion (all passed)
No new warnings in changed files

BFL (reference) vs Diffusers (this PR) — FLUX.2-klein-4B, 1024x1024, seed 42, 4 steps, prompt "gandalf you shall not pass":

…ion() Add checks for diffusers-style tensor names alongside existing BFL-format checks: - double_stream_modulation_img.linear.weight (.linear. vs .lin.) - single_transformer_blocks.47. (diffusers block naming) This enables get_sd_version() to correctly identify FLUX.2 and FLUX.2-klein models loaded from diffusers-format safetensors checkpoints.

…LUX.2 diffusers support Implement diffusers→BFL tensor name mapping for FLUX.2 architecture: - Shared modulation: .linear. → .lin. - Time embedding: time_guidance_embed → time_in - Double blocks: transformer_blocks → double_blocks (with SwiGLU MLP) - Single blocks: single_transformer_blocks → single_blocks (fused qkv) - Final layer mapping Block counts are parametric (not hardcoded) for future FLUX.2 variants. Route FLUX.2 versions to new function in convert_diffusion_model_name(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add dst_offset to TensorStorage for partial tensor loads. Detect split tensor parts (.weight.1, .weight.2) and compute byte offsets to fuse them into the expected combined tensor. Handle adaLN modulation half-swap (diffusers [shift,scale] → BFL [scale,shift]). Add SD_DUMP_CHECKSUMS env var for tensor comparison debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

41 tests covering all tensor categories: embedders, shared modulations, double blocks (q/k/v split, proj, norms, MLP SwiGLU), single blocks (fused qkv, proj, norms), final layer. Both VERSION_FLUX2 and VERSION_FLUX2_KLEIN variants tested. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add init_from_safetensors_index() to load multi-shard safetensors files by reading model.safetensors.index.json and loading each shard. Update init_from_file() to auto-detect index.json when .safetensors file is missing. Update init_from_diffusers_file() to support transformer/ directory (DiT models like FLUX, SD3) alongside unet/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Read model_index.json to detect LLM-based text encoders (Qwen, Llama) and use text_encoders.llm. prefix instead of te. for correct loading. Support transformer/ directory in init_from_diffusers_file() for DiT models (FLUX, SD3). Enables single-directory loading: sd-cli --diffusion-model <diffusers-dir>/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

GreenShadows · 2026-03-27T10:37:12Z

Not very useful if you ask me.

roof and others added 9 commits March 26, 2026 19:40

Merge branch 'feature/flux2-diffusers-detection'

edb38a8

Merge branch 'feature/flux2-diffusers-conversion'

68185a2

chore: remove SD_DUMP_CHECKSUMS debug code

70cfd6e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ComputerPers marked this pull request as ready for review March 27, 2026 10:33

loci-dev mentioned this pull request Mar 28, 2026

UPSTREAM PR #1369: feat: add diffusers format support for FLUX.2-klein auroralabs-loci/stable-diffusion.cpp#95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add diffusers format support for FLUX.2-klein#1369

feat: add diffusers format support for FLUX.2-klein#1369
ComputerPers wants to merge 9 commits intoleejet:masterfrom
ComputerPers:feature/flux2-klein-diffusers-support

ComputerPers commented Mar 27, 2026 •

edited

Loading

Uh oh!

GreenShadows commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ComputerPers commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before / After

Testing

Uh oh!

GreenShadows commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ComputerPers commented Mar 27, 2026 •

edited

Loading