Cacheable stable-diffusion.cpp (Fork for Streaming API)

Diffusion model (SD, Flux, Wan, ...) inference in pure C/C++

This repository is a fork of leejet/stable-diffusion.cpp, modified to introduce a Condition Caching (Streaming API). While the upstream repo excels at stateless generation, this fork is specifically enhanced for real-time video generation and high-throughput img2img streaming applications where heavy text encoder re-evaluations (e.g., Qwen/LLM for Flux.2) become devastating bottlenecks.

By leveraging this fork's C API extensions, you can cache prompt conditions and reference images, skipping the LLM layers entirely on subsequent frames.

🚀 Fork-Specific Features (What's New?)

This repository introduces several major enhancements compared to the upstream implementation, specifically targeting high-performance streaming and advanced denoising optimization.

1. Condition Caching (Streaming API)

The primary feature of this fork. It separates the expensive preparation phases (Text Encoding & Reference Image Encoding) from the hot-loop diffusion phase.

Goal: Real-time Video-to-Video and high-FPS img2img.
Key Functions: sd_encode_condition(), sd_encode_ref_image(), sd_img2img_with_cond().
Memory Safety: Includes dedicated FFI cleanup functions (sd_free_image_data, sd_free_images) for robust integration with Rust/Python across DLL boundaries.
📘 Streaming API Design & Architecture
📘 Streaming C API Reference

2. Advanced Denoising Caching (Spectrum, DBCache, etc.)

We have implemented several state-of-the-art caching algorithms that skip or predict UNet/DiT forward passes when the latent changes are small.

Algorithms: Spectrum (Chebyshev forecasting), DBCache (Block-level skipping), TaylorSeer, UCache, and EasyCache.
Performance: Can reduce inference time by 20%–50% with minimal quality loss.
📘 Denoising Caching Guide

3. Extended Model Support & Optimizations

Support for cutting-edge architectures and specific optimizations not found in the baseline repo.

Wan 2.1 / 2.2: High-quality video and image generation.
Flux.1 / Flux.2: Optimized DiT paths.
Chroma / Radiance: Specialized color and lighting models.
PhotoMaker: Identity-preserving personalization.

4. Specialized CI & Packaging (Rust/FFI Friendly)

The GitHub Actions workflows have been enhanced to satisfy the requirements of downstream FFI consumers (like the Rust wrapper cacheable-sd-rs).

Windows Artifacts: In addition to the DLL, the CI now automatically packages the stable-diffusion.lib (import library). This is essential for linking the library correctly when using MSVC or Rust on Windows.
CI Robustness: Workflows have been refined to ensure binary artifacts are correctly bundled across CUDA, Vulkan, and CPU backends.
📘 CI Packaging Diff Memo

📚 Documentation Index

Detailed guides for various components of the library:

Category	Document	Description
Core API	C API Reference	Standard and Streaming API usage.
CLI	CLI Reference	Complete guide for the `sd-cli` tool.
Setup	Build Guide	Compiling with CUDA, Vulkan, Metal, etc.
Internal	Streaming API Design	Deep dive into GGML context management.
Advanced	Caching Guide	Accelerating inference via Spectrum/DBCache.
Models	Wan, Flux.1/2, SD3	High-end model parameters and usage.
Specialized	PhotoMaker, Chroma, Anima	Identity, color science, and animation extensions.
VLM/VIV	Qwen-Image, Z-Image	Visual understanding and editing tools.
Tuning	Performance	Tips for speed and VRAM management.

Upstream Features

This fork retains 100% compatibility with all the amazing features developed by the original stable-diffusion.cpp contributors:

Plain C/C++ implementation based on ggml, working similarly to llama.cpp.
Super lightweight and without external dependencies.
Supported Models: SD1.x, SD2.x, SDXL, SD3, FLUX.1/FLUX.2, Qwen-Image, Z-Image, Wan2.1/2.2, PhotoMaker, and more.
Supported Backends: CPU (AVX2/AVX512), CUDA, Vulkan, Metal, OpenCL, SYCL.
Supported Formats: Pytorch checkpoints (.ckpt/.pth), Safetensors (.safetensors), GGUF (.gguf).
Flash Attention for aggressive memory usage optimization.
LoRA support, ControlNet, LCM, ESRGAN upscaling, and TAESD faster latent decoding.

Quick Start

1. Build from Source

Since you will likely integrate this as a backend for another project, we recommend building from source. For full instructions, see the upstream Build Guide.

# Example: Building with Vulkan acceleration and Shared Libraries (C API)
mkdir build && cd build
cmake .. -DSD_VULKAN=ON -DSD_BUILD_SHARED_LIBS=ON
cmake --build . --config Release
# After a successful build, the CLI binary is at: build/bin/sd-cli
# The shared library is at: build/stable-diffusion.dll (Windows) or build/libstable-diffusion.so (Linux)

2. Standard CLI Usage

Download a core model file (e.g., v1-5-pruned-emaonly.safetensors from Hugging Face).

./bin/sd-cli -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat"

For detailed arguments and use-cases (like img2img or LoRA), check out the CLI Guide.

3. Streaming API Quick Start (C/C++)

The key benefit of this fork is condition caching. Here is a minimal example:

#include "stable-diffusion.h"

// 1. Initialize context (once)
sd_ctx_params_t ctx_params;
sd_ctx_params_init(&ctx_params);
ctx_params.diffusion_model_path = "flux-2-klein-4b.gguf";
ctx_params.vae_path = "ae.safetensors";
ctx_params.llm_path = "qwen3-4b.gguf";
ctx_params.flash_attn = true;
sd_ctx_t* ctx = new_sd_ctx(&ctx_params);

// 2. Encode prompt ONCE (the expensive LLM step)
sd_condition_t* cond = sd_encode_condition(ctx, "cinematic oil painting", "", 512, 512);

// 3. Process each video frame cheaply (no re-encoding)
while (streaming) {
    sd_image_t frame = get_next_frame();
    sd_image_t result = sd_img2img_with_cond(ctx, frame, cond, NULL, 0, 0.75f, 4, 1.0f, -1, NULL);
    render(result);
    sd_free_image_data(result.data); // Use the new cleanup function
    free(frame.data); // frame.data is allocated by our get_next_frame()
}

// 4. Cleanup
sd_free_condition(cond);
free_sd_ctx(ctx);

For a full working example including reference image caching, see examples/stream_img2img/main.cpp.

References

As this is a fork, all credits for the base architecture belong to the respective original project creators:

Name		Name	Last commit message	Last commit date
Latest commit History 540 Commits
.github		.github
assets		assets
docs		docs
examples		examples
ggml @ a8db410		ggml @ a8db410
include		include
script		script
src		src
thirdparty		thirdparty
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
Dockerfile.cuda		Dockerfile.cuda
Dockerfile.musa		Dockerfile.musa
Dockerfile.sycl		Dockerfile.sycl
Dockerfile.vulkan		Dockerfile.vulkan
LICENSE		LICENSE
README.md		README.md
format-code.sh		format-code.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cacheable stable-diffusion.cpp (Fork for Streaming API)

🚀 Fork-Specific Features (What's New?)

1. Condition Caching (Streaming API)

2. Advanced Denoising Caching (Spectrum, DBCache, etc.)

3. Extended Model Support & Optimizations

4. Specialized CI & Packaging (Rust/FFI Friendly)

📚 Documentation Index

Upstream Features

Quick Start

1. Build from Source

2. Standard CLI Usage

3. Streaming API Quick Start (C/C++)

References

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cacheable stable-diffusion.cpp (Fork for Streaming API)

🚀 Fork-Specific Features (What's New?)

1. Condition Caching (Streaming API)

2. Advanced Denoising Caching (Spectrum, DBCache, etc.)

3. Extended Model Support & Optimizations

4. Specialized CI & Packaging (Rust/FFI Friendly)

📚 Documentation Index

Upstream Features

Quick Start

1. Build from Source

2. Standard CLI Usage

3. Streaming API Quick Start (C/C++)

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages