Feature/zimage inpaint pipeline#13006
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a comprehensive inpainting pipeline for Z-Image, extending the existing Z-Image family of pipelines (text-to-image, img2img, controlnet) with mask-based inpainting capabilities. The implementation follows established patterns from other inpainting pipelines in the diffusers library while adapting to Z-Image's specific requirements (e.g., complex64 RoPE embeddings, flow matching scheduler).
Changes:
- Implemented ZImageInpaintPipeline with full inpainting support including mask blending and strength-based denoising control
- Added comprehensive test suite covering inference, batch processing, strength validation, mask functionality, VAE tiling, and device offloading
- Integrated the pipeline into auto_pipeline infrastructure with proper model mapping for "z-image" model type
- Updated all necessary init.py files and dummy objects for proper exports
- Added documentation with usage examples
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/diffusers/pipelines/z_image/pipeline_z_image_inpaint.py | New inpainting pipeline implementation with prepare_mask_latents, prepare_latents, and main call method for mask-based image inpainting |
| tests/pipelines/z_image/test_z_image_inpaint.py | Comprehensive test suite including inference tests, strength parameter validation, mask functionality tests, and compatibility tests |
| src/diffusers/pipelines/z_image/init.py | Added ZImageInpaintPipeline to module exports |
| src/diffusers/pipelines/init.py | Added ZImageInpaintPipeline to main pipelines module exports |
| src/diffusers/init.py | Added ZImageInpaintPipeline to top-level diffusers exports |
| src/diffusers/pipelines/auto_pipeline.py | Mapped ZImageInpaintPipeline to "z-image" in AUTO_INPAINT_PIPELINES_MAPPING |
| src/diffusers/utils/dummy_torch_and_transformers_objects.py | Added dummy ZImageInpaintPipeline class for when dependencies are not available |
| docs/source/en/api/pipelines/z_image.md | Added inpainting section with usage example and API reference |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@yiyixuxu Ready for re-review. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@asomoza How does this look? I still have to do cleanup, but in general structure. |
|
Might as well, I'll integrate it, can always revert and separate into another PR if preferred. |
Updated the pipeline structure to include ZImageInpaintPipeline
alongside ZImagePipeline and ZImageImg2ImgPipeline.
Implemented the ZImageInpaintPipeline class for inpainting
tasks, including necessary methods for encoding prompts,
preparing masked latents, and denoising.
Enhanced the auto_pipeline to map the new ZImageInpaintPipeline
for inpainting generation tasks.
Added unit tests for ZImageInpaintPipeline to ensure
functionality and performance.
Updated dummy objects to include ZImageInpaintPipeline for
testing purposes.
- Add torch.empty fix for x_pad_token and cap_pad_token in test - Add # Copied from annotations for encode_prompt methods - Add documentation with usage example and autodoc directive
Add batch size validation and callback handling fixes per review, using diffusers conventions rather than suggested code verbatim.
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
- Add missing is_torch_xla_available import for TPU support - Add xm.mark_step() in denoising loop for proper XLA execution - Add check_inputs() method for comprehensive input validation - Call check_inputs() at the start of __call__ Addresses PR review feedback from @asomoza.
8d419db to
9a74d2f
Compare
|
@asomoza Apologies for the delay, cleanup and changes requested done. |
9a74d2f to
ac40401
Compare
|
@yiyixuxu @sayakpaul @asomoza any chance we can prioritize review and merge inpaint pipelines? in general, those are highly demanded and very commonly used, but still often overlooked when adding support for new models. |
|
|
||
| model_cpu_offload_seq = "text_encoder->transformer->vae" | ||
| _optional_components = [] | ||
| _callback_tensor_inputs = ["latents", "prompt_embeds", "mask", "masked_image_latents"] |
There was a problem hiding this comment.
you're adding the mask and the masked_image_latents, but does this have an applicable use case? in which use case we can use them on each step? Not a blocker but just curious
|
thanks, if we do the change to the cfg, we can do it to all the pipelines at the same time later. |
What does this PR do?
This PR adds an inpainting pipeline for Z-Image. The summary of changes are below:
Closes issue #12752
Tested using a simple script:
Testing script
LoRA functionality is also supported (inherited from ZImageLoraLoaderMixin).
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@yiyixuxu @asomoza @sayakpaul