Flip is much slower than advanced indexing

## 🐛 Bug

Flip is about 3x slower than advanced indexing, even though advanced indexing is a more general operation.
I have tested this on CPU and CUDA, and on pytorch 1.0 stable and pytorch-nightly.

## To Reproduce

```
import time
import torch

n = 1024
batch_size = 256
ntrials = 1000

x = torch.randn(batch_size, n)

start = time.perf_counter()
[x.flip(-1) for _ in range(ntrials)]
end = time.perf_counter()
print('Flip time (CPU): {}s'.format(end - start))

reverse_index = torch.arange(n - 1, -1, -1)
start = time.perf_counter()
[x[..., reverse_index] for _ in range(ntrials)]
end = time.perf_counter()
print('Advanced indexing time (CPU): {}s'.format(end - start))

x = x.to('cuda')
reverse_index = reverse_index.to('cuda')

torch.cuda.synchronize()
start = time.perf_counter()
[x.flip(-1) for _ in range(ntrials)]
torch.cuda.synchronize()
end = time.perf_counter()
print('Flip time (CUDA): {}s'.format(end - start))

start = time.perf_counter()
[x[..., reverse_index] for _ in range(ntrials)]
torch.cuda.synchronize()
end = time.perf_counter()
print('Advanced indexing time (CUDA): {}s'.format(end - start))
```

```
Flip time (CPU): 0.6906896363943815s
Advanced indexing time (CPU): 0.2781159598380327s
Flip time (CUDA): 0.1045754998922348s
Advanced indexing time (CUDA): 0.016148101538419724s
```

## Expected behavior

Flip should be faster or the same speed as advanced indexing. Right now I have to use advanced indexing for speed, which leads to less readable code.

## Environment

PyTorch version: 1.0.0.dev20190123
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.2 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: GPU 0: Tesla P100-PCIE-16GB
Nvidia driver version: 390.42
cuDNN version: Probably one of the following:
/usr/local/cuda-8.0/lib64/libcudnn.so.6.0.21
/usr/local/cuda-8.0/lib64/libcudnn_static.a
/usr/local/cuda-9.0/lib64/libcudnn.so.7.0.4
/usr/local/cuda-9.0/lib64/libcudnn_static.a

Versions of relevant libraries:
[pip] Could not collect
[conda] blas                      1.0                         mkl
[conda] mkl                       2019.1                      144
[conda] mkl-service               1.1.2            py37he904b0f_5
[conda] mkl_fft                   1.0.6            py37hd81dba3_0
[conda] mkl_random                1.0.2            py37hd81dba3_0
[conda] pytorch                   1.0.0           py3.7_cuda9.0.176_cudnn7.4.1_1    pytorch
[conda] pytorch-nightly           1.0.0.dev20190123 py3.7_cuda9.0.176_cudnn7.4.1_0    pytorch
[conda] torchvision               0.2.1                      py_2    pytorch

cc @VitalyFedyunin @ngimel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flip is much slower than advanced indexing #16424

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Flip is much slower than advanced indexing #16424

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions