Skip to content

Flip is much slower than advanced indexing #16424

@tridao

Description

@tridao

🐛 Bug

Flip is about 3x slower than advanced indexing, even though advanced indexing is a more general operation.
I have tested this on CPU and CUDA, and on pytorch 1.0 stable and pytorch-nightly.

To Reproduce

import time
import torch

n = 1024
batch_size = 256
ntrials = 1000

x = torch.randn(batch_size, n)

start = time.perf_counter()
[x.flip(-1) for _ in range(ntrials)]
end = time.perf_counter()
print('Flip time (CPU): {}s'.format(end - start))

reverse_index = torch.arange(n - 1, -1, -1)
start = time.perf_counter()
[x[..., reverse_index] for _ in range(ntrials)]
end = time.perf_counter()
print('Advanced indexing time (CPU): {}s'.format(end - start))

x = x.to('cuda')
reverse_index = reverse_index.to('cuda')

torch.cuda.synchronize()
start = time.perf_counter()
[x.flip(-1) for _ in range(ntrials)]
torch.cuda.synchronize()
end = time.perf_counter()
print('Flip time (CUDA): {}s'.format(end - start))

start = time.perf_counter()
[x[..., reverse_index] for _ in range(ntrials)]
torch.cuda.synchronize()
end = time.perf_counter()
print('Advanced indexing time (CUDA): {}s'.format(end - start))
Flip time (CPU): 0.6906896363943815s
Advanced indexing time (CPU): 0.2781159598380327s
Flip time (CUDA): 0.1045754998922348s
Advanced indexing time (CUDA): 0.016148101538419724s

Expected behavior

Flip should be faster or the same speed as advanced indexing. Right now I have to use advanced indexing for speed, which leads to less readable code.

Environment

PyTorch version: 1.0.0.dev20190123
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.2 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: GPU 0: Tesla P100-PCIE-16GB
Nvidia driver version: 390.42
cuDNN version: Probably one of the following:
/usr/local/cuda-8.0/lib64/libcudnn.so.6.0.21
/usr/local/cuda-8.0/lib64/libcudnn_static.a
/usr/local/cuda-9.0/lib64/libcudnn.so.7.0.4
/usr/local/cuda-9.0/lib64/libcudnn_static.a

Versions of relevant libraries:
[pip] Could not collect
[conda] blas 1.0 mkl
[conda] mkl 2019.1 144
[conda] mkl-service 1.1.2 py37he904b0f_5
[conda] mkl_fft 1.0.6 py37hd81dba3_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] pytorch 1.0.0 py3.7_cuda9.0.176_cudnn7.4.1_1 pytorch
[conda] pytorch-nightly 1.0.0.dev20190123 py3.7_cuda9.0.176_cudnn7.4.1_0 pytorch
[conda] torchvision 0.2.1 py_2 pytorch

cc @VitalyFedyunin @ngimel

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: performanceIssues related to performance, either of kernel code or framework gluemodule: viewing and reshapingtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions