-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[WIP] Flip a tensor (CPU + CUDA implementation) #6867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
added stress testing for flip
ssnl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a thorough review. Just noticed a couple of things.
test/test_torch.py
Outdated
| @staticmethod | ||
| def _test_flip(self, use_cuda=False): | ||
| if use_cuda and torch.cuda.is_available(): | ||
| x = torch.Tensor([1,2,3,4,5,6,7,8]).view(2, 2, 2).cuda() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_torch.py
Outdated
|
|
||
| @staticmethod | ||
| def _test_flip(self, use_cuda=False): | ||
| if use_cuda and torch.cuda.is_available(): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| self.assertEqual(x.size(), orig) | ||
|
|
||
| @staticmethod | ||
| def _test_flip(self, use_cuda=False): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_autograd.py
Outdated
| out.sum().backward() | ||
| self.assertEqual(x.grad.data, y_data) | ||
|
|
||
| def test_flip(self): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_autograd.py
Outdated
| self.assertEqual(x.grad.data, y_data) | ||
|
|
||
| def test_flip(self): | ||
| x = torch.autograd.Variable(torch.Tensor([0,1,2,3]).view(2, 2), requires_grad=True) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| Tensor res = self.clone(); | ||
| for (auto d : dims) { | ||
| res.copy_(reverse_dim(res, d)); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| // check if number of axis in dim is valid | ||
| if (dims.size() == 0) { | ||
| std::stringstream ss; | ||
| ss << "CUDA: expected dims not empty, " |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| return result; | ||
| } | ||
|
|
||
| __device__ void oneD_to_nD(int64_t oneD_index, int64_t* shape_size, int64_t shape_len, int64_t* nD_index) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| } | ||
| } | ||
|
|
||
| __device__ int64_t nD_to_oneD(int64_t* nD_index, int64_t shape_len, int64_t* shape_size, int64_t src_oneD_index) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| cudaMalloc(&d_dims_t, dims_len * sizeof(int64_t)); | ||
| cudaMemcpy(d_dims_t, dims_t.data<int64_t>(), dims_len * sizeof(int64_t), cudaMemcpyHostToDevice); | ||
|
|
||
| Tensor shape = at::zeros(CPU(kLong), {shape_len}); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
Can I ask a maybe naive question? |
|
@fmassa I am guessing index_select works on one dimension at a time. When it comes to flip a tensor along many dimensions, the cost of launching CUDA kernels multiple times can be optimized by a customized implementation. |
|
I'd rather support negative step-size in indexing instead of dedicated flip kernels. NumPy's |
|
Another option is to use |
|
@colesbury upvote for negative step-size indexing |
|
Supporting negative strides would be indeed a very good addition, but I don't think it is an easy endeavour. @ezyang is planning to add negative stride support in c10, so maybe he can comment on that |
…ad of dim copies at CPU implementation'
|
I'd avoid blocking on this work. We plan on making negative strides work as we start earnestly porting all of TH into C10/ATen, but that porting work is going to take some time. |
| } | ||
|
|
||
| Tensor res = self.clone(); | ||
| for (auto d : dims) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
I reiterate my question here: can't we use |
|
Maybe you could have a look at https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp#L188 ? It initializes the index tensor used in |
|
Ah, wait. I think I'm wrong here, and the desired behaviour is actually to launch several |
|
I think you could still use |
|
@fmassa I see, thanks a lot for looking into this! |
| __device__ __forceinline__ | ||
| void oneD_to_nD(int64_t oneD_index, int64_t* shape_size, int64_t shape_len, int64_t* nD_index) { | ||
| int64_t res = oneD_index; | ||
| for (int i = 0; i < shape_len; i++) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| __device__ __forceinline__ | ||
| int64_t nD_to_oneD(int64_t* nD_index, int64_t shape_len, int64_t* shape_size, int64_t src_oneD_index) { | ||
| int64_t dest_oneD_index = 0; | ||
| for (int i = 0; i < shape_len; i++) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| Here element 3 has nD index = (0,1,0), and this corresponds to oneD index = 2 | ||
| */ | ||
| __device__ __forceinline__ | ||
| int64_t nD_to_oneD(int64_t* nD_index, int64_t shape_len, int64_t* shape_size, int64_t src_oneD_index) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| for (int i = 0 ; i < dims_len; i++) { | ||
| int64_t d = dims[i]; | ||
| int64_t nD_d = oneD_index * shape_len + d; | ||
| nD_index[nD_d] = shape[d]-1-nD_index[nD_d]; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| return dest_oneD_index; | ||
| } | ||
|
|
||
| template <typename T> |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| out_t[oneD_index] = in_t[dest_oneD_index]; | ||
| } | ||
|
|
||
| Tensor flip_cuda(const Tensor& self, IntList dims) { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@fmassa What do you think about the patch now? |
| Tensor flip_cuda(const Tensor& self, IntList dims) { | ||
|
|
||
| // TODO: allow non-contiguous tensors | ||
| self.contiguous(); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| Tensor shape_t = at::zeros(CPU(kLong), {total_dims}); | ||
| int64_t* shape_t_d = shape_t.data<int64_t>(); | ||
| for (int64_t i = 0; i < total_dims; i++) { | ||
| shape_t_d[i] = shape[i]; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| Here element 3 has nD index = (0,1,0), and this corresponds to oneD index = 2 | ||
| */ | ||
| __device__ __forceinline__ |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| shape_t_d[i] = shape[i]; | ||
| } | ||
|
|
||
| Tensor each_dim_len = at::zeros(CPU(kLong), {total_dims}); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| Tensor out_t = self.clone(); | ||
| for (auto d : dims) { | ||
| out_t.copy_(reverse_dim(out_t, d)); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_autograd.py
Outdated
| ('reshape', (S,), (S,), '1d'), | ||
| ('reshape', (), (dont_convert(()),), 'scalar_to_scalar'), | ||
| ('reshape', (), (1,), 'scalar_to_1d'), | ||
| ('flip', torch.rand(S, S, S).requires_grad_(), ([0],), 'd0'), |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s
| } | ||
|
|
||
| Tensor indices = at::zeros(CUDA(kLong), {N, total_dims}); | ||
| Tensor out_t = self.clone(); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| return; | ||
| } | ||
|
|
||
| linear_index_to_indices(linear_index, each_dim_len, total_dims, indices); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
So it’s strides, considering the current code only works on contiguous
tensora. Can we just use strides as I suggested above.
…On Wed, May 2, 2018 at 01:18 Wei Yang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In aten/src/ATen/native/cuda/TensorTransformations.cu
<#6867 (comment)>:
> + }
+
+ Tensor flip_dims_t = at::zeros(CPU(kLong), {flip_dims_size});
+ int64_t* flip_dims_t_d = flip_dims_t.data<int64_t>();
+ for (int64_t i = 0; i < flip_dims_size; i++) {
+ flip_dims_t_d[i] = dims[i];
+ }
+
+ auto shape = self.sizes();
+ Tensor shape_t = at::zeros(CPU(kLong), {total_dims});
+ int64_t* shape_t_d = shape_t.data<int64_t>();
+ for (int64_t i = 0; i < total_dims; i++) {
+ shape_t_d[i] = shape[i];
+ }
+
+ Tensor each_dim_len = at::zeros(CPU(kLong), {total_dims});
oops... sorry I misinterpreted the question. "sizes of a tensor" is
actually the same as "shape" here. "each_dim_len" is different, it is
defined as the total number of elements in a subarray of the current
dimension. In the example t = [[1,2], [3,4], [5,6]], I can visualize its
1st dim as rows, and 2nd dim as columns. If current dimension is 0 (row),
then each_dim_len[0] = 2; if current dimension if 1 (col), then
each_dim_len[1] = 1. So here each_dim_len = (2, 1). "each_dim_len" is
mainly for the conversion between indices and linear_index
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6867 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFaWZV5dt57Ekp9pDX2XSOZTCGcXubBwks5tuJjcgaJpZM4TgSEk>
.
|
|
@ssnl sure, I will work on the noncontiguous case. For the nd indices array, please correct me if I am wrong, it is easy to work with this abstraction because even with strides and linear index, we still need some transformations/step (with for loops) to compute the the dest linear index, where things might be more transparent if using nd indices array. |
|
Any news on this PR? |
|
@ivan-bilan still trying to handle non-contiguous input in cuda implementation, will update in 1-2 days |
|
For reasons I don't understand, this branch doesn't compile flip() along with other code for me. I am getting error: AttributeError: 'Tensor' object has no attribute 'flip'. I will close this PR and create a new one. |
|
Replacing by #7873 |
Summary:
Usage:
x = torch.arange(6).view(2, 3)
y = x.flip(0, 1) # flip along the 1st and 2nd dimensions