Interpolate fix on cuda for large output tensors#10067
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
sayakpaul
left a comment
There was a problem hiding this comment.
Thanks, Pedro. The fix indeed looks like 🧠
I think this should be tested as well. Could we add a test here?
| scale_factor = ( | ||
| 2 if output_size is None else max([f / s for f, s in zip(output_size, hidden_states.shape[-2:])]) | ||
| ) | ||
| if hidden_states.numel() * scale_factor > pow(2, 31): |
There was a problem hiding this comment.
Consider keeping pow(2, 31) in a variable and then using it.
There was a problem hiding this comment.
I feel that pow(2, 31) is easy to understand and provides explicit documentation about what's happening. Using another level of indirection wouldn't help, unless we are concerned about performance, in which case I can use a constant.
| # if `output_size` is passed we force the interpolation output | ||
| # size and do not make use of `scale_factor=2` | ||
| if self.interpolate: | ||
| # upsample_nearest_nhwc also fails when the number of output elements is large |
There was a problem hiding this comment.
Do we need to check for channels_first or channels_last memory layout?
There was a problem hiding this comment.
My assumption is it will behave the same way for any non-contiguous layouts, and that calling .contiguous() on a contiguous tensor will be a no-op. I'm also mimicking the same structure as in lines 161-163 above.
Sure, I can add some tests. They will require GPU and a good amount of GPU RAM, but I think they should fit in the CI environment. |
Do you have an estimate on how much? 16GB shouldn't be enough? |
I think so, yes, I'll check. |
|
also I'm cool with the code as it is! |
* Workaround for upscale with large output tensors. Fixes huggingface#10040. * Fix scale when output_size is given * Style --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
|
Thanks a lot for merging, and my apologies for being slow to react!
Not the same, but related. That issue was fixed a while ago, we could check the minimum PyTorch version and only apply this similar workaround if necessary. |
* Workaround for upscale with large output tensors. Fixes #10040. * Fix scale when output_size is given * Style --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Fixes #10040.
As described in pytorch/pytorch#141831, there is a silent bug in PyTorch (CUDA) that causes the result of the upsampling operation to be wrong when the output size is beyond a threshold.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
cc @sayakpaul @asomoza as they are involved in #10040. In addition to that issue, I think the silent failure mode could come up in other cases as well and would be difficult to track.