Incorrect output shape for max-pooling

## 🐛 Bug

For some combinations of input size, pooling size and padding, the output size of `max_pool1d` is wrong. Furthermore, it is different between CPU and CUDA.

## To Reproduce

Steps to reproduce the behavior:

On GPU:
```python
import torch
x = torch.rand(19200000 - 2, dtype=torch.float32, device='cuda:0')
width = 1921
y = torch.nn.functional.max_pool1d(x[None, None], width, stride=1, padding=width//2)[0, 0]
print(x.shape)
print(y.shape)
```
I get:
```
torch.Size([19199998])
torch.Size([19199996])
```

On CPU:
```python
import torch
x = torch.rand(19200000 - 2, dtype=torch.float32, device='cpu')
width = 1921
y = torch.nn.functional.max_pool1d(x[None, None], width, stride=1, padding=width//2)[0, 0]
print(x.shape)
print(y.shape)
```
I get:
```
torch.Size([19199998])
torch.Size([19199997])
```

For an input length of `19200000`, I get an output length of `19200000` on GPU and `19200001` on CPU.

## Expected behavior

In all cases, the output shape should match the input shape -- it's an uneven pooling window with stride 1 and matching symmetric padding. Also the output shape should be the same regardless of the device. Note that it works fine on both CUDA and CPU for an input length of `1920000 - 2` (one order of magnitude smaller).

## Environment

I can reproduce this both with a precompiled PyTorch 0.4.1 and a self-compiled PyTorch 1.0.

Pre-compiled:
```
Collecting environment information...
PyTorch version: 0.4.1
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 2.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: TITAN X (Pascal)
GPU 1: TITAN X (Pascal)

Nvidia driver version: 390.30
cuDNN version: Probably one of the following:
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5.1.10
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn_static.a

Versions of relevant libraries:
[pip] Could not collect
[conda] Could not collect
```

Self-compiled:
```
PyTorch version: 1.0.0a0+710191e
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 2.7
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: GeForce GTX 1060
Nvidia driver version: 410.57
cuDNN version: Probably one of the following:
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudnn.so.7.3.1
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudnn_static.a
/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7.1.4
/usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn_static.a

Versions of relevant libraries:
[pip] Could not collect
[conda] Could not collect
```

## Additional context

For what it's worth, it works fine using Lasagne/Theano both on CPU and CUDA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect output shape for max-pooling #13386

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect output shape for max-pooling #13386

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions