Skip to content

Conversation

@weiyangfb
Copy link
Contributor

@weiyangfb weiyangfb commented Apr 23, 2018

Summary:

  1. fix flip a Tensor #229, implemented torch.flip() to reverse tensor along specified dimensions
  2. implemented forward and backward functions for both of CPU and CUDA
  3. added tests at test_torch, test_cuda, test_autograd

Usage:
x = torch.arange(6).view(2, 3)
y = x.flip(0, 1) # flip along the 1st and 2nd dimensions

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a thorough review. Just noticed a couple of things.

@staticmethod
def _test_flip(self, use_cuda=False):
if use_cuda and torch.cuda.is_available():
x = torch.Tensor([1,2,3,4,5,6,7,8]).view(2, 2, 2).cuda()

This comment was marked as off-topic.


@staticmethod
def _test_flip(self, use_cuda=False):
if use_cuda and torch.cuda.is_available():

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

self.assertEqual(x.size(), orig)

@staticmethod
def _test_flip(self, use_cuda=False):

This comment was marked as off-topic.

out.sum().backward()
self.assertEqual(x.grad.data, y_data)

def test_flip(self):

This comment was marked as off-topic.

self.assertEqual(x.grad.data, y_data)

def test_flip(self):
x = torch.autograd.Variable(torch.Tensor([0,1,2,3]).view(2, 2), requires_grad=True)

This comment was marked as off-topic.


Tensor res = self.clone();
for (auto d : dims) {
res.copy_(reverse_dim(res, d));

This comment was marked as off-topic.

This comment was marked as off-topic.

// check if number of axis in dim is valid
if (dims.size() == 0) {
std::stringstream ss;
ss << "CUDA: expected dims not empty, "

This comment was marked as off-topic.

return result;
}

__device__ void oneD_to_nD(int64_t oneD_index, int64_t* shape_size, int64_t shape_len, int64_t* nD_index) {

This comment was marked as off-topic.

This comment was marked as off-topic.

}
}

__device__ int64_t nD_to_oneD(int64_t* nD_index, int64_t shape_len, int64_t* shape_size, int64_t src_oneD_index) {

This comment was marked as off-topic.

cudaMalloc(&d_dims_t, dims_len * sizeof(int64_t));
cudaMemcpy(d_dims_t, dims_t.data<int64_t>(), dims_len * sizeof(int64_t), cudaMemcpyHostToDevice);

Tensor shape = at::zeros(CPU(kLong), {shape_len});

This comment was marked as off-topic.

This comment was marked as off-topic.

@fmassa
Copy link
Member

fmassa commented Apr 24, 2018

Can I ask a maybe naive question?
Why do we need a dedicated implementation for CUDA? Can't we directly dispatch to index_select, which has a (fairly) optimized implementation for CUDA already?

@weiyangfb
Copy link
Contributor Author

@fmassa I am guessing index_select works on one dimension at a time. When it comes to flip a tensor along many dimensions, the cost of launching CUDA kernels multiple times can be optimized by a customized implementation.

@colesbury
Copy link
Member

I'd rather support negative step-size in indexing instead of dedicated flip kernels. NumPy's flip implementation is only a few lines:

https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2ec8221348b/numpy/lib/function_base.py#L202-L210

@fmassa
Copy link
Member

fmassa commented Apr 24, 2018

Another option is to use tensor.index, which performs advanced indexing on several dimensions.

@weiyangfb
Copy link
Contributor Author

@colesbury upvote for negative step-size indexing

@fmassa
Copy link
Member

fmassa commented Apr 24, 2018

Supporting negative strides would be indeed a very good addition, but I don't think it is an easy endeavour. @ezyang is planning to add negative stride support in c10, so maybe he can comment on that

@ezyang
Copy link
Contributor

ezyang commented Apr 25, 2018

I'd avoid blocking on this work. We plan on making negative strides work as we start earnestly porting all of TH into C10/ATen, but that porting work is going to take some time.

}

Tensor res = self.clone();
for (auto d : dims) {

This comment was marked as off-topic.

This comment was marked as off-topic.

@fmassa
Copy link
Member

fmassa commented Apr 25, 2018

I reiterate my question here: can't we use tensor.index here? It's a generalization of index_select for multiple dimensions, and would avoid the need of dedicated cpp/cuda implementations

@fmassa
Copy link
Member

fmassa commented Apr 25, 2018

Maybe you could have a look at https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp#L188 ? It initializes the index tensor used in tensor.index. Let me know if you don't manage to use it, I'm typing from my phone now

@fmassa
Copy link
Member

fmassa commented Apr 25, 2018

Ah, wait. I think I'm wrong here, and the desired behaviour is actually to launch several index_select kernels, in which case your implementation might be better

@fmassa
Copy link
Member

fmassa commented Apr 25, 2018

I think you could still use index by creating the indices similarly to mesh grid, but this might just be less efficient

@weiyangfb
Copy link
Contributor Author

@fmassa I see, thanks a lot for looking into this!

__device__ __forceinline__
void oneD_to_nD(int64_t oneD_index, int64_t* shape_size, int64_t shape_len, int64_t* nD_index) {
int64_t res = oneD_index;
for (int i = 0; i < shape_len; i++) {

This comment was marked as off-topic.

__device__ __forceinline__
int64_t nD_to_oneD(int64_t* nD_index, int64_t shape_len, int64_t* shape_size, int64_t src_oneD_index) {
int64_t dest_oneD_index = 0;
for (int i = 0; i < shape_len; i++) {

This comment was marked as off-topic.

Here element 3 has nD index = (0,1,0), and this corresponds to oneD index = 2
*/
__device__ __forceinline__
int64_t nD_to_oneD(int64_t* nD_index, int64_t shape_len, int64_t* shape_size, int64_t src_oneD_index) {

This comment was marked as off-topic.

for (int i = 0 ; i < dims_len; i++) {
int64_t d = dims[i];
int64_t nD_d = oneD_index * shape_len + d;
nD_index[nD_d] = shape[d]-1-nD_index[nD_d];

This comment was marked as off-topic.

return dest_oneD_index;
}

template <typename T>

This comment was marked as off-topic.

out_t[oneD_index] = in_t[dest_oneD_index];
}

Tensor flip_cuda(const Tensor& self, IntList dims) {

This comment was marked as off-topic.

@ezyang
Copy link
Contributor

ezyang commented Apr 27, 2018

@fmassa What do you think about the patch now?

Tensor flip_cuda(const Tensor& self, IntList dims) {

// TODO: allow non-contiguous tensors
self.contiguous();

This comment was marked as off-topic.

Tensor shape_t = at::zeros(CPU(kLong), {total_dims});
int64_t* shape_t_d = shape_t.data<int64_t>();
for (int64_t i = 0; i < total_dims; i++) {
shape_t_d[i] = shape[i];

This comment was marked as off-topic.

This comment was marked as off-topic.

Here element 3 has nD index = (0,1,0), and this corresponds to oneD index = 2
*/
__device__ __forceinline__

This comment was marked as off-topic.

This comment was marked as off-topic.

shape_t_d[i] = shape[i];
}

Tensor each_dim_len = at::zeros(CPU(kLong), {total_dims});

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.


Tensor out_t = self.clone();
for (auto d : dims) {
out_t.copy_(reverse_dim(out_t, d));

This comment was marked as off-topic.

('reshape', (S,), (S,), '1d'),
('reshape', (), (dont_convert(()),), 'scalar_to_scalar'),
('reshape', (), (1,), 'scalar_to_1d'),
('flip', torch.rand(S, S, S).requires_grad_(), ([0],), 'd0'),

This comment was marked as off-topic.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s

}

Tensor indices = at::zeros(CUDA(kLong), {N, total_dims});
Tensor out_t = self.clone();

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

return;
}

linear_index_to_indices(linear_index, each_dim_len, total_dims, indices);

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@ssnl
Copy link
Collaborator

ssnl commented May 1, 2018 via email

@weiyangfb
Copy link
Contributor Author

@ssnl sure, I will work on the noncontiguous case. For the nd indices array, please correct me if I am wrong, it is easy to work with this abstraction because even with strides and linear index, we still need some transformations/step (with for loops) to compute the the dest linear index, where things might be more transparent if using nd indices array.

@ivan-bilan
Copy link

Any news on this PR?

@weiyangfb
Copy link
Contributor Author

@ivan-bilan still trying to handle non-contiguous input in cuda implementation, will update in 1-2 days

@weiyangfb
Copy link
Contributor Author

For reasons I don't understand, this branch doesn't compile flip() along with other code for me. I am getting error: AttributeError: 'Tensor' object has no attribute 'flip'. I will close this PR and create a new one.

@weiyangfb weiyangfb closed this May 26, 2018
@weiyangfb
Copy link
Contributor Author

Replacing by #7873

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flip a Tensor

7 participants