Add enable_vae_tiling to AllegroPipeline, fix example#10212
Add enable_vae_tiling to AllegroPipeline, fix example#10212yiyixuxu merged 1 commit intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
a-r-r-o-w
left a comment
There was a problem hiding this comment.
Thanks!
Regarding the number of inference steps being high, inference with 50 steps produces noisy videos. The default steps in the official repo is 100 as well. For faster inference, we could explore the different cache techniques soon.
|
@a-r-r-o-w Is lowering import torch
from diffusers import AutoencoderKLAllegro, AllegroPipeline
from diffusers.utils import export_to_video
vae = AutoencoderKLAllegro.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32)
pipe = AllegroPipeline.from_pretrained("rhymes-ai/Allegro", vae=vae, torch_dtype=torch.bfloat16).to("cuda")
pipe.vae.enable_tiling()
prompt = (
"A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, "
"the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this "
"location might be a popular spot for docking fishing boats."
)
video = pipe(prompt, num_inference_steps=100, num_frames=22, guidance_scale=7.5, max_sequence_length=512, generator=torch.Generator().manual_seed(0)).frames[0]
export_to_video(video, "AllegroPipeline.mp4", fps=15)AllegroPipeline.mp4 |
|
I believe that is expected. My memory is a bit hazy since it's been a while since we integrated it, but IIRC Allegro only works at 88 frames and specific resolutions (?). I think it is okay to not run the full test if it's time consuming. What you could do instead is - save outputs of all intermediate transformers blocks and after decoding for say 2 inference steps, and compare that to your sincos/rope changes by looking at absmax of the intermediates from both runs. This should be a sufficient check imo |
What does this PR do?
Allegro doesn't support VAE without tiling and is missing
enable_vae_tilingetc on the pipeline.diffusers/src/diffusers/models/autoencoders/autoencoder_kl_allegro.py
Line 851 in cef0e36
Note that the default
num_inference_steps=100is very slow (~1h on A40, only slightly faster on A6000 Ada) so the example could have the value changed as well.Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sayakpaul @yiyixuxu @DN6