-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Bug
For some combinations of input size, pooling size and padding, the output size of max_pool1d is wrong. Furthermore, it is different between CPU and CUDA.
To Reproduce
Steps to reproduce the behavior:
On GPU:
import torch
x = torch.rand(19200000 - 2, dtype=torch.float32, device='cuda:0')
width = 1921
y = torch.nn.functional.max_pool1d(x[None, None], width, stride=1, padding=width//2)[0, 0]
print(x.shape)
print(y.shape)I get:
torch.Size([19199998])
torch.Size([19199996])
On CPU:
import torch
x = torch.rand(19200000 - 2, dtype=torch.float32, device='cpu')
width = 1921
y = torch.nn.functional.max_pool1d(x[None, None], width, stride=1, padding=width//2)[0, 0]
print(x.shape)
print(y.shape)I get:
torch.Size([19199998])
torch.Size([19199997])
For an input length of 19200000, I get an output length of 19200000 on GPU and 19200001 on CPU.
Expected behavior
In all cases, the output shape should match the input shape -- it's an uneven pooling window with stride 1 and matching symmetric padding. Also the output shape should be the same regardless of the device. Note that it works fine on both CUDA and CPU for an input length of 1920000 - 2 (one order of magnitude smaller).
Environment
I can reproduce this both with a precompiled PyTorch 0.4.1 and a self-compiled PyTorch 1.0.
Pre-compiled:
Collecting environment information...
PyTorch version: 0.4.1
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1
Python version: 2.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: TITAN X (Pascal)
GPU 1: TITAN X (Pascal)
Nvidia driver version: 390.30
cuDNN version: Probably one of the following:
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5.1.10
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn_static.a
Versions of relevant libraries:
[pip] Could not collect
[conda] Could not collect
Self-compiled:
PyTorch version: 1.0.0a0+710191e
Is debug build: No
CUDA used to build PyTorch: 10.0.130
OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1
Python version: 2.7
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: GeForce GTX 1060
Nvidia driver version: 410.57
cuDNN version: Probably one of the following:
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudnn.so.7.3.1
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudnn_static.a
/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7.1.4
/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn_static.a
Versions of relevant libraries:
[pip] Could not collect
[conda] Could not collect
Additional context
For what it's worth, it works fine using Lasagne/Theano both on CPU and CUDA.