Interpolate fix on cuda for large output tensors by pcuenca · Pull Request #10067 · huggingface/diffusers

pcuenca · 2024-12-01T12:11:03Z

As described in pytorch/pytorch#141831, there is a silent bug in PyTorch (CUDA) that causes the result of the upsampling operation to be wrong when the output size is beyond a threshold.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

cc @sayakpaul @asomoza as they are involved in #10040. In addition to that issue, I think the silent failure mode could come up in other cases as well and would be difficult to track.

Fixes #10040.

HuggingFaceDocBuilderDev · 2024-12-01T12:22:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

Thanks, Pedro. The fix indeed looks like 🧠

I think this should be tested as well. Could we add a test here?

src/diffusers/models/upsampling.py

sayakpaul · 2024-12-01T13:28:05Z

src/diffusers/models/upsampling.py

+            scale_factor = (
+                2 if output_size is None else max([f / s for f, s in zip(output_size, hidden_states.shape[-2:])])
+            )
+            if hidden_states.numel() * scale_factor > pow(2, 31):


Consider keeping pow(2, 31) in a variable and then using it.

I feel that pow(2, 31) is easy to understand and provides explicit documentation about what's happening. Using another level of indirection wouldn't help, unless we are concerned about performance, in which case I can use a constant.

Along the same lines as #10067 (comment).

sayakpaul · 2024-12-01T13:29:52Z

src/diffusers/models/upsampling.py

        # if `output_size` is passed we force the interpolation output
        # size and do not make use of `scale_factor=2`
        if self.interpolate:
+            # upsample_nearest_nhwc also fails when the number of output elements is large


Do we need to check for channels_first or channels_last memory layout?

My assumption is it will behave the same way for any non-contiguous layouts, and that calling .contiguous() on a contiguous tensor will be a no-op. I'm also mimicking the same structure as in lines 161-163 above.

pcuenca · 2024-12-02T11:41:30Z

I think this should be tested as well. Could we add a test here?

Sure, I can add some tests. They will require GPU and a good amount of GPU RAM, but I think they should fit in the CI environment.

sayakpaul · 2024-12-02T12:34:17Z

Sure, I can add some tests. They will require GPU and a good amount of GPU RAM, but I think they should fit in the CI environment.

Do you have an estimate on how much? 16GB shouldn't be enough?

pcuenca · 2024-12-02T14:43:16Z

Sure, I can add some tests. They will require GPU and a good amount of GPU RAM, but I think they should fit in the CI environment.

Do you have an estimate on how much? 16GB shouldn't be enough?

I think so, yes, I'll check.

yiyixuxu

thanks for the fix!
is this same bug as #984 (that's been there for 2 years?)

yiyixuxu · 2024-12-02T23:31:50Z

also I'm cool with the code as it is!
for the test @sayakpaul do you want to help add one?

* Workaround for upscale with large output tensors. Fixes huggingface#10040. * Fix scale when output_size is given * Style --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

pcuenca · 2024-12-05T19:03:12Z

Thanks a lot for merging, and my apologies for being slow to react!

is this same bug as #984 (that's been there for 2 years?)

Not the same, but related. That issue was fixed a while ago, we could check the minimum PyTorch version and only apply this similar workaround if necessary.

* Workaround for upscale with large output tensors. Fixes #10040. * Fix scale when output_size is given * Style --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

pcuenca added 3 commits December 1, 2024 13:01

Workaround for upscale with large output tensors.

35085ea

Fixes #10040.

Fix scale when output_size is given

73177df

Style

e527813

pcuenca changed the title ~~Interpolate fix~~ Interpolate fix on cuda for large output tensors Dec 1, 2024

sayakpaul reviewed Dec 1, 2024

View reviewed changes

sayakpaul requested a review from yiyixuxu December 1, 2024 13:28

sayakpaul reviewed Dec 1, 2024

View reviewed changes

Merge branch 'main' into interpolate-fix

b2e1c2f

yiyixuxu approved these changes Dec 2, 2024

View reviewed changes

yiyixuxu merged commit 2312b27 into main Dec 2, 2024

yiyixuxu deleted the interpolate-fix branch December 2, 2024 23:34

Conversation

pcuenca commented Dec 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

Uh oh!

HuggingFaceDocBuilderDev commented Dec 1, 2024

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul Dec 1, 2024

Choose a reason for hiding this comment

Uh oh!

pcuenca Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

sayakpaul Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

sayakpaul Dec 1, 2024

Choose a reason for hiding this comment

Uh oh!

pcuenca Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

pcuenca commented Dec 2, 2024

Uh oh!

sayakpaul commented Dec 2, 2024

Uh oh!

pcuenca commented Dec 2, 2024

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Dec 2, 2024

Uh oh!

pcuenca commented Dec 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pcuenca commented Dec 1, 2024 •

edited

Loading