klein.c - FLUX.2 Klein CPU Text-to-Image Generator

klein.c is a compressed pure C implementation of iris.c for text-to-image generation using FLUX.2 Klein transformer models. Built specifically for Windows with native Win32 GUI support.

Overview

klein.c is a full CPU inference pipeline that generates images from text prompts using the FLUX.2 diffusion transformer architecture. It is derived from and inspired by iris.c by Salvatore Sanfilippo (@antirez), with optimizations for CPU-only execution on Windows platforms.

Key Features

Pure CPU Inference: No GPU required - runs entirely on CPU using BLAS/OpenBLAS
Windows Native GUI: Built-in Win32 graphical interface for easy image generation
Memory-Efficient: Sequential model loading (Encoder -> Transformer -> VAE) to minimize RAM usage
BF16 Hardware Detection: Automatic detection of AVX512-BF16 support (Intel Ice Lake+, AMD Zen 4+)
High-Resolution Timers: Detailed benchmarking with Windows QueryPerformanceCounter APIs
Multiple Output Formats: Saves images as both BMP and PNG formats

Architecture

klein.c implements the complete FLUX.2 inference pipeline:

1. Qwen3 Text Encoder

Vocabulary: 151,936 tokens
Hidden Size: 2,560
Layers: 36 transformer layers
Attention: 32 heads with 8 KV heads
Sequence Length: 512 (padded)
Output Layers: Layers 9, 18, 27 concatenated for final embeddings
Embedding Dimension: 7,680 (3 × 2,560)

The tokenizer uses BPE (Byte Pair Encoding) with a custom vocabulary and merge table.

2. FLUX Transformer (Rectified Flow)

Hidden Size: 3,072
Attention Heads: 24
Head Dimension: 128
MLP Hidden: 9,216 (3× hidden)
Double Blocks: 5 (joint image-text attention)
Single Blocks: 20 (image-only attention)
Latent Channels: 128
RoPE Theta: 2,000
Max Sequence: 52,000 tokens

The transformer uses rectified flow for faster convergence, predicting velocity instead of noise.

3. VAE (Variational Autoencoder)

Latent Channels: 32 → 128
Base Channels: 128
Channel Multipliers: [1, 2, 4, 4]
Resolution: 8× spatial compression
Residual Blocks: 2 per layer
Attention Blocks: Included in decoder

Inference Flow

Text Prompt
    ↓
[1] Qwen3 Encoder (load → encode → free)
    ↓
Text Embeddings [512, 7680]
    ↓
[2] FLUX Transformer (load → denoise → free)
    ↓
Denoised Latent [128, H/16, W/16]
    ↓
[3] VAE Decoder (load → decode → free)
    ↓
Final Image [3, H, W]
    ↓
Save as PNG/BMP

Command-Line Usage

CLI Mode

klein_cpu.exe <model_dir> [prompt] [-s steps] [-S seed] [-W width] [-H height]

Arguments:

model_dir - Path to the FLUX.2 model directory (containing safetensors files)
prompt - Text description of the image to generate (default: "a red apple")
-s steps - Number of denoising steps (default: 1)
-S seed - Random seed for reproducibility (default: 42)
-W width - Output image width (default: 64)
-H height - Output image height (default: 64)

Example:

klein_cpu.exe C:/models/flux-klein "a beautiful sunset over ocean" -s 4 -S 123 -W 512 -H 512

GUI Mode

Simply run klein_cpu.exe without arguments to launch the graphical interface:

klein_cpu.exe

The GUI provides:

Text prompt input
Model folder selection (with browse button)
Width/Height/Seed/Steps configuration
Generate button
Status display with inference time
Generated image preview

Performance

Benchmarking Features

klein.c includes detailed timing for each pipeline stage:

================================================================================
  PERFORMANCE TIMINGS
================================================================================
  Encoder Loading:      8.50 seconds
  Transformer Load:    15.20 seconds
  VAE Loading:         12.30 seconds
  ---------------------------------------------------------------------------
  Text Encoding:        2.10 seconds
  Denoising:           45.00 seconds
  VAE Decoding:         8.50 seconds
  ---------------------------------------------------------------------------
  TOTAL INFERENCE:     91.60 seconds
================================================================================

BF16 Support Detection

The application automatically detects hardware support for BF16:

Native (AVX512-BF16): Intel Ice Lake+ processors
Emulated (F32): Older CPUs without BF16 support

Model Requirements

klein.c requires the FLUX.2 Klein model files in safetensors format:

model_dir/
├── model.safetensors          # Main model weights
├── tokenizer.json             # BPE tokenizer
└── tokenizer_config.json      # Tokenizer configuration

Expected tensor names:

encoder.* - Qwen3 encoder weights
transformer.* - FLUX transformer weights
vae.* - VAE decoder weights

Technical Details

Memory Management

klein.c uses a low-RAM sequential loading strategy:

Load encoder → encode text → free encoder
Load transformer → denoise → free transformer
Load VAE → decode → free VAE

This approach keeps memory usage minimal by only having one model in memory at a time.

Windows Integration

QueryPerformanceCounter: High-resolution timing
Win32 GUI: Native window with controls
CreateProcess: Spawns CLI for generation from GUI
SHBrowseForFolder: Folder browser dialog
BMP/PNG Saving: Windows-compatible image formats

Data Types

Weights: Stored as FP16/BF16, converted to FP32 for computation
Latents: Float32 throughout pipeline
Attention: Flash attention style with proper masking

Building

Prerequisites

Windows 10/11
MSVC or MinGW-w64 compiler
OpenBLAS (optional, for faster matrix operations)

CMake Build

mkdir build
cd build
cmake .. -G "Visual Studio 17 2022"  # or "MinGW Makefiles"
cmake --build . --config Release

File Structure

klein.c/
├── main_cpu.c          # Entry point + GUI implementation
├── klein_cpu.h         # Header with all API definitions
├── klein_cpu.c         # Implementation of all components
├── CMakeLists.txt      # CMake build configuration
└── README.md           # This file

Comparison with iris.c

Feature	iris.c	klein.c (klein_cpu)
Platform	macOS/Linux	Windows
GPU	Metal (Apple Silicon)	CPU only
Dependencies	Optional BLAS	OpenBLAS (optional)
GUI	Terminal display	Win32 native GUI
Models	Multiple FLUX variants	FLUX.2 Klein focused
Memory	mmap support	Sequential loading

Credits

Original iris.c: Salvatore Sanfilippo (@antirez)
FLUX.2 Models: Black Forest Labs
klein.c/CPU Port: Camenduru

License

MIT License

This project is derived from iris.c which is also MIT licensed. See the original iris.c repository for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

klein.c - FLUX.2 Klein CPU Text-to-Image Generator

Overview

Key Features

Architecture

1. Qwen3 Text Encoder

2. FLUX Transformer (Rectified Flow)

3. VAE (Variational Autoencoder)

Inference Flow

Command-Line Usage

CLI Mode

GUI Mode

Performance

Benchmarking Features

BF16 Support Detection

Model Requirements

Technical Details

Memory Management

Windows Integration

Data Types

Building

Prerequisites

CMake Build

File Structure

Comparison with iris.c

Credits

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
klein_cpu.c		klein_cpu.c
klein_cpu.h		klein_cpu.h
main_cpu.c		main_cpu.c

Folders and files

Latest commit

History

Repository files navigation

klein.c - FLUX.2 Klein CPU Text-to-Image Generator

Overview

Key Features

Architecture

1. Qwen3 Text Encoder

2. FLUX Transformer (Rectified Flow)

3. VAE (Variational Autoencoder)

Inference Flow

Command-Line Usage

CLI Mode

GUI Mode

Performance

Benchmarking Features

BF16 Support Detection

Model Requirements

Technical Details

Memory Management

Windows Integration

Data Types

Building

Prerequisites

CMake Build

File Structure

Comparison with iris.c

Credits

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages