[WIP] Flip a tensor (CPU + CUDA implementation) #6867

weiyangfb · 2018-04-23T17:11:42Z

Summary:

fix flip a Tensor #229, implemented torch.flip() to reverse tensor along specified dimensions
implemented forward and backward functions for both of CPU and CUDA
added tests at test_torch, test_cuda, test_autograd

Usage:
x = torch.arange(6).view(2, 3)
y = x.flip(0, 1) # flip along the 1st and 2nd dimensions

added stress testing for flip

ssnl

Not a thorough review. Just noticed a couple of things.

test/test_torch.py

+    @staticmethod
+    def _test_flip(self, use_cuda=False):
+        if use_cuda and torch.cuda.is_available():
+            x = torch.Tensor([1,2,3,4,5,6,7,8]).view(2, 2, 2).cuda()


test/test_torch.py


+    @staticmethod
+    def _test_flip(self, use_cuda=False):
+        if use_cuda and torch.cuda.is_available():


test/test_torch.py

        self.assertEqual(x.size(), orig)

+    @staticmethod
+    def _test_flip(self, use_cuda=False):


test/test_autograd.py

        out.sum().backward()
        self.assertEqual(x.grad.data, y_data)

+    def test_flip(self):


test/test_autograd.py

        self.assertEqual(x.grad.data, y_data)

+    def test_flip(self):
+        x = torch.autograd.Variable(torch.Tensor([0,1,2,3]).view(2, 2), requires_grad=True)


aten/src/ATen/native/TensorFactories.cpp

+
+  Tensor res = self.clone();
+  for (auto d : dims) {
+    res.copy_(reverse_dim(res, d));


aten/src/ATen/native/cuda/TensorFactories.cu

+  // check if number of axis in dim is valid
+  if (dims.size() == 0) {
+    std::stringstream ss;
+    ss << "CUDA: expected dims not empty, "


aten/src/ATen/native/cuda/TensorFactories.cu

  return result;
 }

+__device__ void oneD_to_nD(int64_t oneD_index, int64_t* shape_size, int64_t shape_len, int64_t* nD_index) {


aten/src/ATen/native/cuda/TensorFactories.cu

+  }
+}
+
+__device__ int64_t nD_to_oneD(int64_t* nD_index, int64_t shape_len, int64_t* shape_size, int64_t src_oneD_index) {


aten/src/ATen/native/cuda/TensorFactories.cu

+  cudaMalloc(&d_dims_t, dims_len * sizeof(int64_t));
+  cudaMemcpy(d_dims_t, dims_t.data<int64_t>(), dims_len * sizeof(int64_t), cudaMemcpyHostToDevice);
+
+  Tensor shape = at::zeros(CPU(kLong), {shape_len});


fmassa · 2018-04-24T17:31:44Z

Can I ask a maybe naive question?
Why do we need a dedicated implementation for CUDA? Can't we directly dispatch to index_select, which has a (fairly) optimized implementation for CUDA already?

weiyangfb · 2018-04-24T17:52:04Z

@fmassa I am guessing index_select works on one dimension at a time. When it comes to flip a tensor along many dimensions, the cost of launching CUDA kernels multiple times can be optimized by a customized implementation.

colesbury · 2018-04-24T17:59:58Z

I'd rather support negative step-size in indexing instead of dedicated flip kernels. NumPy's flip implementation is only a few lines:

https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2ec8221348b/numpy/lib/function_base.py#L202-L210

fmassa · 2018-04-24T18:08:05Z

Another option is to use tensor.index, which performs advanced indexing on several dimensions.

weiyangfb · 2018-04-24T18:30:11Z

@colesbury upvote for negative step-size indexing

fmassa · 2018-04-24T18:35:06Z

Supporting negative strides would be indeed a very good addition, but I don't think it is an easy endeavour. @ezyang is planning to add negative stride support in c10, so maybe he can comment on that

…ad of dim copies at CPU implementation'

…ommits

ezyang · 2018-04-25T03:25:53Z

I'd avoid blocking on this work. We plan on making negative strides work as we start earnestly porting all of TH into C10/ATen, but that porting work is going to take some time.

aten/src/ATen/native/TensorTransformations.cpp

+  }
+
+  Tensor res = self.clone();
+  for (auto d : dims) {


fmassa · 2018-04-25T18:19:07Z

I reiterate my question here: can't we use tensor.index here? It's a generalization of index_select for multiple dimensions, and would avoid the need of dedicated cpp/cuda implementations

fmassa · 2018-04-25T20:00:29Z

Maybe you could have a look at https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp#L188 ? It initializes the index tensor used in tensor.index. Let me know if you don't manage to use it, I'm typing from my phone now

fmassa · 2018-04-25T20:01:49Z

Ah, wait. I think I'm wrong here, and the desired behaviour is actually to launch several index_select kernels, in which case your implementation might be better

fmassa · 2018-04-25T20:03:53Z

I think you could still use index by creating the indices similarly to mesh grid, but this might just be less efficient

weiyangfb · 2018-04-25T21:25:20Z

@fmassa I see, thanks a lot for looking into this!

aten/src/ATen/native/cuda/TensorTransformations.cu

+__device__ __forceinline__
+void oneD_to_nD(int64_t oneD_index, int64_t* shape_size, int64_t shape_len, int64_t* nD_index) {
+  int64_t res = oneD_index;
+  for (int i = 0; i < shape_len; i++) {


aten/src/ATen/native/cuda/TensorTransformations.cu

+__device__ __forceinline__
+int64_t nD_to_oneD(int64_t* nD_index, int64_t shape_len, int64_t* shape_size, int64_t src_oneD_index) {
+  int64_t dest_oneD_index = 0;
+  for (int i = 0; i < shape_len; i++) {


aten/src/ATen/native/cuda/TensorTransformations.cu

+Here element 3 has nD index = (0,1,0), and this corresponds to oneD index = 2
+*/
+__device__ __forceinline__
+int64_t nD_to_oneD(int64_t* nD_index, int64_t shape_len, int64_t* shape_size, int64_t src_oneD_index) {


aten/src/ATen/native/cuda/TensorTransformations.cu

+  for (int i = 0 ; i < dims_len; i++) {
+    int64_t d = dims[i];
+    int64_t nD_d = oneD_index * shape_len + d;
+    nD_index[nD_d] = shape[d]-1-nD_index[nD_d];


aten/src/ATen/native/cuda/TensorTransformations.cu

+  return dest_oneD_index;
+}
+
+template <typename T>


aten/src/ATen/native/cuda/TensorTransformations.cu

+  out_t[oneD_index] = in_t[dest_oneD_index];
+}
+
+Tensor flip_cuda(const Tensor& self, IntList dims) {


ezyang · 2018-04-27T19:50:38Z

@fmassa What do you think about the patch now?

aten/src/ATen/native/cuda/TensorTransformations.cu

+Tensor flip_cuda(const Tensor& self, IntList dims) {
+
+  // TODO: allow non-contiguous tensors
+  self.contiguous();


aten/src/ATen/native/cuda/TensorTransformations.cu

+  Tensor shape_t = at::zeros(CPU(kLong), {total_dims});
+  int64_t* shape_t_d = shape_t.data<int64_t>();
+  for (int64_t i = 0; i < total_dims; i++) {
+    shape_t_d[i] = shape[i];


aten/src/ATen/native/cuda/TensorTransformations.cu

+
+Here element 3 has nD index = (0,1,0), and this corresponds to oneD index = 2
+*/
+__device__ __forceinline__


aten/src/ATen/native/cuda/TensorTransformations.cu

+    shape_t_d[i] = shape[i];
+  }
+
+  Tensor each_dim_len = at::zeros(CPU(kLong), {total_dims});


aten/src/ATen/native/TensorTransformations.cpp

+
+  Tensor out_t = self.clone();
+  for (auto d : dims) {
+    out_t.copy_(reverse_dim(out_t, d));


test/test_autograd.py

    ('reshape', (S,), (S,), '1d'),
    ('reshape', (), (dont_convert(()),), 'scalar_to_scalar'),
    ('reshape', (), (1,), 'scalar_to_1d'),
+    ('flip', torch.rand(S, S, S).requires_grad_(), ([0],), 'd0'),


ezyang

s

aten/src/ATen/native/cuda/TensorTransformations.cu

+  }
+
+  Tensor indices = at::zeros(CUDA(kLong), {N, total_dims});
+  Tensor out_t = self.clone();


aten/src/ATen/native/cuda/TensorTransformations.cu

+    return;
+  }
+
+  linear_index_to_indices(linear_index, each_dim_len, total_dims, indices);


ssnl · 2018-05-01T17:51:22Z

So it’s strides, considering the current code only works on contiguous tensora. Can we just use strides as I suggested above.

…

On Wed, May 2, 2018 at 01:18 Wei Yang ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In aten/src/ATen/native/cuda/TensorTransformations.cu <#6867 (comment)>: > + } + + Tensor flip_dims_t = at::zeros(CPU(kLong), {flip_dims_size}); + int64_t* flip_dims_t_d = flip_dims_t.data<int64_t>(); + for (int64_t i = 0; i < flip_dims_size; i++) { + flip_dims_t_d[i] = dims[i]; + } + + auto shape = self.sizes(); + Tensor shape_t = at::zeros(CPU(kLong), {total_dims}); + int64_t* shape_t_d = shape_t.data<int64_t>(); + for (int64_t i = 0; i < total_dims; i++) { + shape_t_d[i] = shape[i]; + } + + Tensor each_dim_len = at::zeros(CPU(kLong), {total_dims}); oops... sorry I misinterpreted the question. "sizes of a tensor" is actually the same as "shape" here. "each_dim_len" is different, it is defined as the total number of elements in a subarray of the current dimension. In the example t = [[1,2], [3,4], [5,6]], I can visualize its 1st dim as rows, and 2nd dim as columns. If current dimension is 0 (row), then each_dim_len[0] = 2; if current dimension if 1 (col), then each_dim_len[1] = 1. So here each_dim_len = (2, 1). "each_dim_len" is mainly for the conversion between indices and linear_index — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6867 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFaWZV5dt57Ekp9pDX2XSOZTCGcXubBwks5tuJjcgaJpZM4TgSEk> .

weiyangfb · 2018-05-01T21:49:42Z

@ssnl sure, I will work on the noncontiguous case. For the nd indices array, please correct me if I am wrong, it is easy to work with this abstraction because even with strides and linear index, we still need some transformations/step (with for loops) to compute the the dest linear index, where things might be more transparent if using nd indices array.

ivan-bilan · 2018-05-23T19:43:56Z

Any news on this PR?

weiyangfb · 2018-05-23T19:46:44Z

@ivan-bilan still trying to handle non-contiguous input in cuda implementation, will update in 1-2 days

weiyangfb · 2018-05-26T00:35:07Z

For reasons I don't understand, this branch doesn't compile flip() along with other code for me. I am getting error: AttributeError: 'Tensor' object has no attribute 'flip'. I will close this PR and create a new one.

weiyangfb · 2018-05-26T04:41:33Z

Replacing by #7873

weiyangfb added 12 commits April 23, 2018 07:36

added CPU torch.flip function

acf9515

added back third_party/eigen, third_party/onnx

881c88b

checking and erasing duplicated dims

af549df

try to revert third_party/eigen

40a906f

try to revert third_party/onnx

debe079

try to revert aten/src/ATen/native/native_functions.yaml

fbc4193

rm no needed comments at test/test_cuda.py

dfae695

CUDA implementation of flip a tensor, verbose version with logs

c3abcc0

cleaned up logs, added stress test for cuda flip

b66d3b4

added stress testing for flip

try to untrack third_party/eigen and third_party/onnx

3539dc0

added back third_party/eigen and third_party/onnx

d8abb9c

try to revert changes in third_party/eigen and third_party/onnx

86dff1b

weiyangfb requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners April 23, 2018 17:11

ssnl reviewed Apr 23, 2018

View reviewed changes

weiyangfb added 2 commits April 24, 2018 16:33

addressed comments, except for 1) 'You should do this in 1 copy inste…

205bd32

…ad of dim copies at CPU implementation'

revert test/test_autograd.py and test/test_torch.py to remove error c…

0a44235

…ommits

weiyangfb commented Apr 25, 2018

View reviewed changes

aten/src/ATen/native/TensorTransformations.cpp

}

Tensor res = self.clone();

for (auto d : dims) {

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

nit fixes

c7a9b89

zou3519 reviewed Apr 26, 2018

View reviewed changes

aten/src/ATen/native/cuda/TensorTransformations.cu Outdated

for (int i = 0 ; i < dims_len; i++) {

int64_t d = dims[i];

int64_t nD_d = oneD_index * shape_len + d;

nD_index[nD_d] = shape[d]-1-nD_index[nD_d];

This comment was marked as off-topic.

Sign in to view

zou3519 reviewed Apr 26, 2018

View reviewed changes

aten/src/ATen/native/cuda/TensorTransformations.cu Outdated

return dest_oneD_index;

}

template <typename T>

This comment was marked as off-topic.

Sign in to view

zou3519 reviewed Apr 26, 2018

View reviewed changes

aten/src/ATen/native/cuda/TensorTransformations.cu

out_t[oneD_index] = in_t[dest_oneD_index];

}

Tensor flip_cuda(const Tensor& self, IntList dims) {

This comment was marked as off-topic.

Sign in to view

weiyangfb added 2 commits April 26, 2018 14:15

more nit fixes

9590216

more nit fixes

a5537c2

zou3519 reviewed Apr 30, 2018

View reviewed changes

ssnl reviewed Apr 30, 2018

View reviewed changes

aten/src/ATen/native/TensorTransformations.cpp Outdated

Tensor out_t = self.clone();

for (auto d : dims) {

out_t.copy_(reverse_dim(out_t, d));

This comment was marked as off-topic.

Sign in to view

ssnl reviewed Apr 30, 2018

View reviewed changes

test/test_autograd.py Outdated

('reshape', (S,), (S,), '1d'),

('reshape', (), (dont_convert(()),), 'scalar_to_scalar'),

('reshape', (), (1,), 'scalar_to_1d'),

('flip', torch.rand(S, S, S).requires_grad_(), ([0],), 'd0'),

This comment was marked as off-topic.

Sign in to view

ezyang requested changes Apr 30, 2018

View reviewed changes

ssnl reviewed May 1, 2018

View reviewed changes

weiyangfb added 3 commits May 14, 2018 15:55

working to support non-contiguous case in flip (cuda)

ec87215

nits, and [WIP] support non-contiguous case in cuda

e6d9ae9

nits

f385f42

weiyangfb closed this May 26, 2018

[WIP] Flip a tensor (CPU + CUDA implementation) #6867

[WIP] Flip a tensor (CPU + CUDA implementation) #6867

Uh oh!

Conversation

weiyangfb commented Apr 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssnl left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

fmassa commented Apr 24, 2018

Uh oh!

weiyangfb commented Apr 24, 2018

Uh oh!

colesbury commented Apr 24, 2018

Uh oh!

fmassa commented Apr 24, 2018

Uh oh!

weiyangfb commented Apr 24, 2018

Uh oh!

fmassa commented Apr 24, 2018

Uh oh!

ezyang commented Apr 25, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

fmassa commented Apr 25, 2018

Uh oh!

fmassa commented Apr 25, 2018

Uh oh!

fmassa commented Apr 25, 2018

Uh oh!

fmassa commented Apr 25, 2018

Uh oh!

weiyangfb commented Apr 25, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

weiyangfb commented Apr 23, 2018 •

edited

Loading