Skip to content

Conversation

@vguzov
Copy link

@vguzov vguzov commented Mar 1, 2018

Hello,
While working on transferring neural network architecture from PyTorch to Caffe I noticed a serious difference between PyTorch bilinear upsampling layer and a Caffe's deconvolution with bilinear kernel.
After some research, it turned out that PyTorch bilinear upsampling code clearly has one problem: Result depends on spatial size, which means that, for example, two similar parts of two images won't be the same after upsampling if these images have different sizes.
This bug is caused by pixel coordinate computation trick, which is pretty, but inaccurate.
Visual example of the bug and the fix can be viewed here: https://gist.github.com/Dorimer/e2389bfc4dd86c7ebbea3a1f45fc4327

@vguzov vguzov changed the title Changes in bilinear upsampling Bilinear upsampling fix Mar 2, 2018
@colesbury
Copy link
Member

@pytorchbot test this please

@ezyang
Copy link
Contributor

ezyang commented Mar 5, 2018

Hi @Dorimer, this is great! I know you have tested this fix visually, but I am wondering if there isn't a automatic test we can add to the test suite to prevent this from regressing if someone ever reimplements this kernel. Perhaps doing the allclose computation on the two differently sized inputs?

@ssnl
Copy link
Collaborator

ssnl commented Mar 19, 2018

@Dorimer If you don't have time working on adding a test, I am happy to write one.

@vguzov
Copy link
Author

vguzov commented Mar 19, 2018

@ssnl Yes, I tried to understand the testing system a couple of times, but as you said I don't have much time right now, so my progress is very slow. I'll be glad if you can add a test, thank you in advance =)

@ssnl
Copy link
Collaborator

ssnl commented Mar 21, 2018

The fix is not ideal. You end up with

>>> m = nn.Upsample(scale_factor=2, mode='bilinear')
>>> input = torch.arange(1, 5).view(1, 1, 2, 2)
>>> m(input)

(0 ,0 ,.,.) =
  1.0000  1.2500  1.7500  2.0000          # <-- this was  1, 1.33, 1.67, 2
  1.5000  1.7500  2.2500  2.5000
  2.5000  2.7500  3.2500  3.5000
  3.0000  3.2500  3.7500  4.0000
[torch.FloatTensor of size (1,1,4,4)]

>>> input

(0 ,0 ,.,.) =
  1  2
  3  4
[torch.FloatTensor of size (1,1,2,2)]

I suggest doing what tf does, i.e. having an align_corners kwarg. If align_corners=True, it does the current behavior by using (out-1)/(in-1) as ratio, which gives nice images. If align_corners=False, it does the correct but uglier thing, i.e. using out/in as ratio. Since this is in nn domain, it should be defaulted to False as what is done in tf.

Let me what you think. @Dorimer @soumith @colesbury

@vguzov
Copy link
Author

vguzov commented Mar 21, 2018

@ssnl

The fix is not ideal.

This is not an error, because it is a formal bilinear interpolation behaviour, which strictly follows the definition in Wikipedia. It is also identical with caffe's upsampling as shown here: https://gist.github.com/Dorimer/93470646cdac125b36ab54e742796571
Yes, it may give questionable results at a first glance, but I can show that this is correct. Let's look at the first row's pixels, their values and their horizontal coordinates (vertical can be omitted because we are next to the border). For example, the 2nd output pixel (located at 0.75) has a value of 1.25 because it is an interpolation between the 1st input pixel (located at 0.5 with value 1) and a 2nd input pixel (located at 1.5 with value 2). So, it is 1*(1-(0.75-0.5))+2*(1-(1.5-0.75))=0.75+0.5=1.25
If we talk about #5927, what if align_corners=False could be done with const Acctype h1r = (h2 > 0) ? rheight * (h2 - Acctype(0.5)) : Acctype(0); to preserve formal behaviour and align_corners=True with const Acctype h1r = rheight * h2;?

@ssnl
Copy link
Collaborator

ssnl commented Mar 22, 2018

@Dorimer I see. I think what you said makes sense. I'll think more about it tomorrow. Thanks!

@ssnl
Copy link
Collaborator

ssnl commented Mar 22, 2018

@Dorimer I agree that the src_idx computation needs to be updated, but I also think your formula is wrong. I don't know where you get the formula and it is different from the algorithm on wikipedia. The correct formula, I think, should be
src_idx = ratio * (dst_idx + 0.5) - 0.5. I updated #5927 to adopt this approach.

@ezyang ezyang added the module: bc-breaking Related to a BC-breaking change label Mar 23, 2018
@ssnl
Copy link
Collaborator

ssnl commented Mar 26, 2018

I believe this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: bc-breaking Related to a BC-breaking change open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants