Skip to content

Allow VideoProcessor to Accept Single Image Inputs#13084

Open
dg845 wants to merge 3 commits intomainfrom
video-processor-accept-imagelike-inputs
Open

Allow VideoProcessor to Accept Single Image Inputs#13084
dg845 wants to merge 3 commits intomainfrom
video-processor-accept-imagelike-inputs

Conversation

@dg845
Copy link
Collaborator

@dg845 dg845 commented Feb 5, 2026

What does this PR do?

This PR adds support for single-image inputs to VideoProcessor.preprocess_video. As VideoProcessor.preprocess_video uses VaeImageProcessor.preprocess under the hood, the PR also changes preprocess_video to forward keyword arguments to preprocess. This allows it to support arguments that preprocess supports, such as resize_mode.

Changelist

  1. Adds support for single image videos for VideoProcessor.preprocess_video.
  2. VideoProcessor.preprocess_video and VideoProcessor.postprocess_video now forward keyword arguments to VaeImageProcessor.preprocess and VaeImageProcessor.postprocess, respectively.
  3. Improve docstrings in VideoProcessor.

Inspired by discussion in #13058.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul
@yiyixuxu

@dg845 dg845 requested a review from sayakpaul February 5, 2026 02:50
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.


def postprocess_video(
self, video: torch.Tensor, output_type: str = "np"
self, video: torch.Tensor, output_type: str = "np", **kwargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would kwargs facilitate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this would facilitate passing the do_denormalize flag to VaeImageProcessor.postprocess. But it's intended more as a forward-looking change which allows postprocess_video to support any arguments that postprocess might want.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment. I guess we can create an LTX-2 specific video processor subclassing from the current one and implement center cropping logic?

@dg845
Copy link
Collaborator Author

dg845 commented Feb 5, 2026

For LTX-2, the idea is that after the changes in this PR, we can call preprocess_video(..., resize_mode="crop") to preprocess the conditions with center cropping like in the original code. I tried this on the bird image used in the FLF2V example and the result looks very close to the original code:

test

@sayakpaul
Copy link
Member

@yiyixuxu WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants