State-of-the-art diffusion models for image and audio generation in MindSpore. We've tried to provide a completely consistent interface and usage with the huggingface/diffusers. Only necessary changes are made to the huggingface/diffusers to make it seamless for users from torch.
Important
This project is still under active development and many features are not yet well-supported. Any contribution is welcome!
Warning
Due to differences in framework, some APIs will not be identical to huggingface/diffusers in the foreseeable future, see Limitations for details.
🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, 🤗 Diffusers is a modular toolbox that supports both. Our library is designed with a focus on usability over performance, simple over easy, and customizability over abstractions.
🤗 Diffusers offers three core components:
- State-of-the-art diffusion pipelines that can be run in inference with just a few lines of code.
- Interchangeable noise schedulers for different diffusion speeds and output quality.
- Pretrained models that can be used as building blocks, and combined with schedulers, for creating your own end-to-end diffusion systems.
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the from_pretrained method to load any pretrained diffusion model (browse the Hub for 19000+ checkpoints):
- from diffusers import DiffusionPipeline
+ from mindone.diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
- torch_dtype=torch.float16,
+ mindspore_dtype=mindspore.float16
use_safetensors=True
)
prompt = "An astronaut riding a green horse"
images = pipe(prompt=prompt)[0][0]You can also dig into the models and schedulers toolbox to build your own diffusion system:
from mindone.diffusers import DDPMScheduler, UNet2DModel
from PIL import Image
from mindspore import ops
scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256")
model = UNet2DModel.from_pretrained("google/ddpm-cat-256")
scheduler.set_timesteps(50)
sample_size = model.config.sample_size
noise = ops.randn((1, 3, sample_size, sample_size))
input = noise
for t in scheduler.timesteps:
noisy_residual = model(input, t)[0]
prev_noisy_sample = scheduler.step(noisy_residual, t, input)[0]
input = prev_noisy_sample
image = (input / 2 + 0.5).clamp(0, 1)
image = image.permute(0, 2, 3, 1).numpy()[0]
image = Image.fromarray((image * 255).round().astype("uint8"))
imageCheck out the Quickstart to launch your diffusion journey today!
torch_dtypeis renamed tomindspore_dtypedevice_map,max_memory,offload_folder,offload_state_dict,low_cpu_mem_usagewill not be supported.
- Default value of
return_dictis changed toFalse, forGRAPH_MODEdoes not allow to construct an instance of it.
Unlike the output posterior = DiagonalGaussianDistribution(latent), which can do sampling by posterior.sample().
We can only output the latent and then do sampling through AutoencoderKL.diag_gauss_dist.sample(latent).
Hacked together @geniuspatrick. All credit goes to huggingface/diffusers and original contributors.