Skip to content

helgev-trap/cacheable-stable-diffusion.cpp

 
 

Repository files navigation

Cacheable stable-diffusion.cpp (Fork for Streaming API)

Diffusion model (SD, Flux, Wan, ...) inference in pure C/C++

This repository is a fork of leejet/stable-diffusion.cpp, modified to introduce a Condition Caching (Streaming API). While the upstream repo excels at stateless generation, this fork is specifically enhanced for real-time video generation and high-throughput img2img streaming applications where heavy text encoder re-evaluations (e.g., Qwen/LLM for Flux.2) become devastating bottlenecks.

By leveraging this fork's C API extensions, you can cache prompt conditions and reference images, skipping the LLM layers entirely on subsequent frames.


🚀 Fork-Specific Features (What's New?)

This repository introduces several major enhancements compared to the upstream implementation, specifically targeting high-performance streaming and advanced denoising optimization.

1. Condition Caching (Streaming API)

The primary feature of this fork. It separates the expensive preparation phases (Text Encoding & Reference Image Encoding) from the hot-loop diffusion phase.

  • Goal: Real-time Video-to-Video and high-FPS img2img.
  • Key Functions: sd_encode_condition(), sd_encode_ref_image(), sd_img2img_with_cond().
  • Memory Safety: Includes dedicated FFI cleanup functions (sd_free_image_data, sd_free_images) for robust integration with Rust/Python across DLL boundaries.
  • 📘 Streaming API Design & Architecture
  • 📘 Streaming C API Reference

2. Advanced Denoising Caching (Spectrum, DBCache, etc.)

We have implemented several state-of-the-art caching algorithms that skip or predict UNet/DiT forward passes when the latent changes are small.

  • Algorithms: Spectrum (Chebyshev forecasting), DBCache (Block-level skipping), TaylorSeer, UCache, and EasyCache.
  • Performance: Can reduce inference time by 20%–50% with minimal quality loss.
  • 📘 Denoising Caching Guide

3. Extended Model Support & Optimizations

Support for cutting-edge architectures and specific optimizations not found in the baseline repo.

4. Specialized CI & Packaging (Rust/FFI Friendly)

The GitHub Actions workflows have been enhanced to satisfy the requirements of downstream FFI consumers (like the Rust wrapper cacheable-sd-rs).

  • Windows Artifacts: In addition to the DLL, the CI now automatically packages the stable-diffusion.lib (import library). This is essential for linking the library correctly when using MSVC or Rust on Windows.
  • CI Robustness: Workflows have been refined to ensure binary artifacts are correctly bundled across CUDA, Vulkan, and CPU backends.
  • 📘 CI Packaging Diff Memo

📚 Documentation Index

Detailed guides for various components of the library:

Category Document Description
Core API C API Reference Standard and Streaming API usage.
CLI CLI Reference Complete guide for the sd-cli tool.
Setup Build Guide Compiling with CUDA, Vulkan, Metal, etc.
Internal Streaming API Design Deep dive into GGML context management.
Advanced Caching Guide Accelerating inference via Spectrum/DBCache.
Models Wan, Flux.1/2, SD3 High-end model parameters and usage.
Specialized PhotoMaker, Chroma, Anima Identity, color science, and animation extensions.
VLM/VIV Qwen-Image, Z-Image Visual understanding and editing tools.
Tuning Performance Tips for speed and VRAM management.

Upstream Features

This fork retains 100% compatibility with all the amazing features developed by the original stable-diffusion.cpp contributors:

  • Plain C/C++ implementation based on ggml, working similarly to llama.cpp.
  • Super lightweight and without external dependencies.
  • Supported Models: SD1.x, SD2.x, SDXL, SD3, FLUX.1/FLUX.2, Qwen-Image, Z-Image, Wan2.1/2.2, PhotoMaker, and more.
  • Supported Backends: CPU (AVX2/AVX512), CUDA, Vulkan, Metal, OpenCL, SYCL.
  • Supported Formats: Pytorch checkpoints (.ckpt/.pth), Safetensors (.safetensors), GGUF (.gguf).
  • Flash Attention for aggressive memory usage optimization.
  • LoRA support, ControlNet, LCM, ESRGAN upscaling, and TAESD faster latent decoding.

Quick Start

1. Build from Source

Since you will likely integrate this as a backend for another project, we recommend building from source. For full instructions, see the upstream Build Guide.

# Example: Building with Vulkan acceleration and Shared Libraries (C API)
mkdir build && cd build
cmake .. -DSD_VULKAN=ON -DSD_BUILD_SHARED_LIBS=ON
cmake --build . --config Release
# After a successful build, the CLI binary is at: build/bin/sd-cli
# The shared library is at: build/stable-diffusion.dll (Windows) or build/libstable-diffusion.so (Linux)

2. Standard CLI Usage

Download a core model file (e.g., v1-5-pruned-emaonly.safetensors from Hugging Face).

./bin/sd-cli -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat"

For detailed arguments and use-cases (like img2img or LoRA), check out the CLI Guide.

3. Streaming API Quick Start (C/C++)

The key benefit of this fork is condition caching. Here is a minimal example:

#include "stable-diffusion.h"

// 1. Initialize context (once)
sd_ctx_params_t ctx_params;
sd_ctx_params_init(&ctx_params);
ctx_params.diffusion_model_path = "flux-2-klein-4b.gguf";
ctx_params.vae_path = "ae.safetensors";
ctx_params.llm_path = "qwen3-4b.gguf";
ctx_params.flash_attn = true;
sd_ctx_t* ctx = new_sd_ctx(&ctx_params);

// 2. Encode prompt ONCE (the expensive LLM step)
sd_condition_t* cond = sd_encode_condition(ctx, "cinematic oil painting", "", 512, 512);

// 3. Process each video frame cheaply (no re-encoding)
while (streaming) {
    sd_image_t frame = get_next_frame();
    sd_image_t result = sd_img2img_with_cond(ctx, frame, cond, NULL, 0, 0.75f, 4, 1.0f, -1, NULL);
    render(result);
    sd_free_image_data(result.data); // Use the new cleanup function
    free(frame.data); // frame.data is allocated by our get_next_frame()
}

// 4. Cleanup
sd_free_condition(cond);
free_sd_ctx(ctx);

For a full working example including reference image caching, see examples/stream_img2img/main.cpp.

References

As this is a fork, all credits for the base architecture belong to the respective original project creators:

About

SD.cppにおいて、文章エンコード結果やVLMエンコード結果をキャッシュ可能にし、使いまわせるようにしたい。

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • C++ 100.0%