-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Bug
When padding_mode='border' in grid_sample, and a grid point falls exactly on the high boundary of the image (size - 1), the gradient should be based on the border padding scheme, which should give either the gradient from just inside the boundary, or zero from just outside the boundary (either could be valid, since it’s a non differentiable point). Instead, the gradient is currently based on zero padding the image, which gives wacky results.
Same problem occurs with padding_mode='reflection' for 2D grid_sample on CPU.
Reflection modes of both the cuda version and the 3D CPU version also have this problem, but it’s arguably worse, since the incorrect gradient is also negated. Furthermore, this is an inconsistency between the behavior of CPU and CUDA kernels.
Example:
image = torch.arange(0, 5, dtype=torch.float).expand((1,1,5,5)).requires_grad_()
id_grid = torch.nn.functional.affine_grid(
torch.tensor([[[1,0,0],[0,1,0.]]]), (1,1,5,5), align_corners=True).requires_grad_()
torch.nn.functional.grid_sample(image, id_grid, padding_mode='border',
align_corners=True).sum().backward()
print(id_grid.grad.permute(0,3,1,2))tensor([[[[ 2., 2., 2., 2., -8.],
[ 2., 2., 2., 2., -8.],
[ 2., 2., 2., 2., -8.],
[ 2., 2., 2., 2., -8.],
[ 2., 2., 2., 2., -8.]],
[[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., -2., -4., -6., -8.]]]])Notice the wacky last row and last column. This is because the gradient there is currently calculated as if the image was zero-padded.
The result should ideally look like
tensor([[[[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.]],
[[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]]]])which finds the gradient using the in-bounds neighbor.
A less ideal, but still palatable result would be
tensor([[[[ 2., 2., 2., 2., 0.],
[ 2., 2., 2., 2., 0.],
[ 2., 2., 2., 2., 0.],
[ 2., 2., 2., 2., 0.],
[ 2., 2., 2., 2., 0.]],
[[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]]]])which finds the gradient using the out-of-bounds, border-padded neighbor.
Reflection mode on cpu (for instance, try using these same commands, but with padding_mode='reflection') gives the exact same problematic result.
When using reflection mode on cuda, however, (as well as for 3D grid_sample on cpu) the problematic gradients are negated!
tensor([[[[2., 2., 2., 2., 8.],
[2., 2., 2., 2., 8.],
[2., 2., 2., 2., 8.],
[2., 2., 2., 2., 8.],
[2., 2., 2., 2., 8.]],
[[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[-0., 2., 4., 6., 8.]]]])This is also problematic, of course, but even more so because of the mismatch between the cpu and cuda behaviors.
For reflection mode, I think it makes sense to set the gradient in such cases to zero, since it’s sort of at the apex of a symmetric hill. But setting it to take the gradient of one side or the other might also be acceptable for most practical purposes.
For border mode, by contrast, I think it makes more sense to always take the non-zero gradient from the inner side, since the outer side gradient will be zero and so effectively stop training (see the related discussion for clamp at #7002 and #7049).
PyTorch Version: tested on commit 0539462