add support for flux2 klein by leejet · Pull Request #1193 · leejet/stable-diffusion.cpp

leejet · 2026-01-15T16:41:13Z

Flux.2 klein 4B

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -p "a lovely cat" --cfg-scale 1.0 --steps 4 -v --offload-to-cpu --diffusion-fa

Flux.2 klein 9B

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-9b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_8b.safetensors -p "a lovely cat" --cfg-scale 1.0 --steps 4 -v --offload-to-cpu --diffusion-fa

Flux.2 klein 4B edit

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -r .\kontext_input.png -p "change 'flux.cpp' to 'klein.cpp'" --cfg-scale 1.0 --sampling-method euler -v --diffusion-fa --offload-to-cpu --steps 4

Flux.2 klein 9B edit

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-9b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_8b.safetensors -r .\kontext_input.png -p "change 'flux.cpp' to 'klein.cpp'" --cfg-scale 1.0 --sampling-method euler -v --diffusion-fa --offload-to-cpu --steps 4

stable-diffusion.cpp

leejet · 2026-01-16T14:09:52Z

Currently, there are still some issues with the support for flux.2 klein. Padding needs to be applied during tokenization and attention_mask must be used in llm.hpp, but at the moment the llm.hpp’s handling of attention_mask may have problems. When attention_mask is enabled, the results become NaN. This is the same issue seen with longcat image. I am still investigating and working on a fix.

stduhpf · 2026-01-16T14:26:39Z

So the --clip-on-cpu workaround should also work there?

leejet · 2026-01-16T14:45:22Z

It doesn’t work on my side.

leejet · 2026-01-16T15:46:32Z

I think I’ve correctly fixed the attention_mask issue.

leejet · 2026-01-16T16:07:25Z

The quality of Flux.2 klein 4B doesn’t seem as good as z-image turbo.

Flux.2 klein 4b

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 5.0 --steps 4 -v --offload-to-cpu --diffusion-fa -v -H 1024 -W 512 --rng cpu

Flux.2 klein base 4b

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-base-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 5.0 --steps 20 -v --offload-to-cpu --diffusion-fa -v -H 1024 -W 512 --rng cpu

Green-Sky · 2026-01-16T17:21:33Z

The quality of Flux.2 klein 4B doesn’t seem as good as z-image turbo.

Flux.2 klein 4b

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 5.0 --steps 4 -v --offload-to-cpu --diffusion-fa -v -H 1024 -W 512 --rng cpu

Not sure about cfg, but they use guidance_scale=1.0 for the distilled (non-base) model.

Also they use guidance_scale=4.0 and num_inference_steps=50 for the base model.

(ref is 4b variants on hf)

edit: cfg of 5 seems comparatively high for models that take larger llm embedding inputs.
edit2: logger.warning(f"Guidance scale {guidance_scale} is ignored for step-wise distilled models.") hmm

edit3:

    def do_classifier_free_guidance(self):
        return self._guidance_scale > 1 and not self.config.is_distilled

So cfg should be 1 for the distilled model.

leejet · 2026-01-16T17:34:59Z

Not sure about cfg, but they use guidance_scale=1.0 for the distilled (non-base) model.

They changed the README on Hugging Face. When I first checked it, the distill model was also using guidance_scale = 4.0. After changing guidance_scale to 1.0f, the image quality did improve a bit, but it’s still not as good as z-image turbo.

https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/commit/5e67da950fce4a097bc150c22958a05716994cea

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 1.0 --steps 4 -v --offload-to-cpu --diffusion-fa -v -H 1024 -W 512 --rng cpu

leejet · 2026-01-16T17:41:00Z

Also they use guidance_scale=4.0 and num_inference_steps=50 for the base model.

In fact, many Hugging Face examples for base models use relatively large step counts, like 40–50 — for example, SDXL uses 40 — but in practice, using around 20 steps often already gives good results.

This is the result with 50 steps. The quality has improved somewhat, but not by much.

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\flux-2-klein-base-4b.safetensors --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_4b.safetensors -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 5.0 --steps 50 -v --offload-to-cpu --diffusion-fa -v -H 1024 -W 512 --rng cpuheric, profound, dark academic" --cfg-scale 5.0 --steps 50 -v --offload-to-cpu --diffusion-fa -v -H 1024 -W 512

Green-Sky · 2026-01-16T17:41:02Z

@leejet you talk about guidance scale, but your command only shows the cfg scale change. Or did you code the guidance scale?

Green-Sky · 2026-01-16T17:41:50Z

Oh and have you tried reference image(s) ? This is a clear advantage over eg z-image.

leejet · 2026-01-16T17:46:06Z

@leejet you talk about guidance scale, but your command only shows the cfg scale change. Or did you code the guidance scale?

guidance_scale in diffusers == --cfg-scale in sd.cpp

leejet · 2026-01-16T17:47:47Z

Oh and have you tried reference image(s) ? This is a clear advantage over eg z-image.

Here I’m comparing the performance for T2I. Using a reference image means it’s image editing, which is a different task. Currently, z-image turbo does not support image editing.

Green-Sky · 2026-01-16T18:06:57Z

@leejet you talk about guidance scale, but your command only shows the cfg scale change. Or did you code the guidance scale?

guidance_scale in diffusers == --cfg-scale in sd.cpp

Guidance scale as defined in [Classifier-Free Diffusion Guidance]

You are right, I did not know that.

Oh and have you tried reference image(s) ? This is a clear advantage over eg z-image.

Here I’m comparing the performance for T2I. Using a reference image means it’s image editing, which is a different task. Currently, z-image turbo does not support image editing.

Yes, I was asking because you did not show any examples yet. :)

leejet · 2026-01-16T18:12:31Z

Yes, I was asking because you did not show any examples yet. :)

I’ve updated some examples of image editing. You can take a look. I think the overall quality of the image edits is pretty good.

fcore117 · 2026-01-21T23:27:14Z

Hello, what i miss with default steps i get bad images, maybe i miss something? for example Z-Image is working full power . Maybe i miss something? there people already get ok images with 4 steps, but for me 4 steps is only a messy image.
_FLUX2_Klein.cmd.txt

Green-Sky · 2026-01-22T11:41:38Z

Hello, what i miss with default steps i get bad images, maybe i miss something? for example Z-Image is working full power . Maybe i miss something? there people already get ok images with 4 steps, but for me 4 steps is only a messy image. _FLUX2_Klein.cmd.txt

There are 2 versions. A distilled model and an undistilled model (base) which is what you are using. 4 steps will only give good results with the distilled version.

fcore117 · 2026-01-22T17:35:00Z

Hello, what i miss with default steps i get bad images, maybe i miss something? for example Z-Image is working full power . Maybe i miss something? there people already get ok images with 4 steps, but for me 4 steps is only a messy image. _FLUX2_Klein.cmd.txt

There are 2 versions. A distilled model and an undistilled model (base) which is what you are using. 4 steps will only give good results with the distilled version.

Thank you for very useful note, i am new in AI stuff and this is good to know. I started to use Stable-Diffusion.cpp because i hate fat bloated software with passion and for me SDcpp is easier to use and is very portable, light, fast and do not depend on system paths etc.

I presume this is general rule about distilled vs undistilled? Undistilled just needs a very high steps?

Green-Sky · 2026-01-23T10:47:17Z

Thank you for very useful note, i am new in AI stuff and this is good to know. I started to use Stable-Diffusion.cpp because i hate fat bloated software with passion and for me SDcpp is easier to use and is very portable, light, fast and do not depend on system paths etc.

I presume this is general rule about distilled vs undistilled? Undistilled just needs a very high steps?

This is getting a bit offtopic, so feel free to open a discussion for further questions. (or pm on tox or something).

Generally there are different forms of "distillation". In this case here it was a step-distillation AND a cfg-distillation. Both reduce how often the diffusion model has to be run per image.

Also generally, every model requires its own set of parameters. Some work better than others.
Eg. most transformer based models(flux/z-image...) work best with simple/smoothstep schedulers and non-ancestral samplers. But this is very much model dependent.

add support for flux2 klein 4b

cccc737

loci-dev mentioned this pull request Jan 15, 2026

UPSTREAM PR #1193: add support for flux2 klein auroralabs-loci/stable-diffusion.cpp#17

Open

stduhpf reviewed Jan 16, 2026

View reviewed changes

stable-diffusion.cpp Outdated Show resolved Hide resolved

add support for flux2 klein 8b

6a478d2

leejet added 3 commits January 16, 2026 23:33

use attention_mask in Flux.2 klein LLMEmbedder

248fc2b

format code

130ca90

remove unnecessary scale

9368608

update docs

5e84587

leejet merged commit 9565c7f into master Jan 17, 2026
13 checks passed

leejet deleted the klein branch January 25, 2026 16:46

Conversation

leejet commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Flux.2 klein 4B

Flux.2 klein 9B

Flux.2 klein 4B edit

Flux.2 klein 9B edit

Uh oh!

Uh oh!

leejet commented Jan 16, 2026

Uh oh!

stduhpf commented Jan 16, 2026

Uh oh!

leejet commented Jan 16, 2026

Uh oh!

leejet commented Jan 16, 2026

Uh oh!

leejet commented Jan 16, 2026

Flux.2 klein 4b

Flux.2 klein base 4b

Uh oh!

Green-Sky commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Flux.2 klein 4b

Uh oh!

leejet commented Jan 16, 2026

Uh oh!

leejet commented Jan 16, 2026

Uh oh!

Green-Sky commented Jan 16, 2026

Uh oh!

Green-Sky commented Jan 16, 2026

Uh oh!

leejet commented Jan 16, 2026

Uh oh!

leejet commented Jan 16, 2026

Uh oh!

Green-Sky commented Jan 16, 2026

Uh oh!

leejet commented Jan 16, 2026

Uh oh!

Uh oh!

fcore117 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Green-Sky commented Jan 22, 2026

Uh oh!

fcore117 commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Green-Sky commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

leejet commented Jan 15, 2026 •

edited

Loading

Green-Sky commented Jan 16, 2026 •

edited

Loading

fcore117 commented Jan 21, 2026 •

edited

Loading

fcore117 commented Jan 22, 2026 •

edited

Loading