slow addmm which comes from bug with CPU backend

I think the [addmm](https://github.com/pytorch/pytorch/blob/master/aten/src/TH/generic/THTensorMath.c#L1936) implementation must have a bug. 
Actually, I am confused by the the matrix tranpose. If  a matrix with size `m x k`,  why the matrix is transpose if `stride[1] == 1 && LDC_COND(r_->size[1], r_->size[0], r_->stride[0])`? The stride in the second dimension is 1 means that the matrix is no-transpose. do I misunderstand that? Or do I miss some usage rules which are established? Or does the code obey CblasColMajor? 

However,  the [comment](https://github.com/pytorch/pytorch/blob/master/aten/src/TH/generic/THTensorMath.c#L2007)  conflict with [the other](https://github.com/pytorch/pytorch/blob/master/aten/src/TH/generic/THTensorMath.c#L2028). And the corresponding code is also not right.

If I misunderstand the matrix transpose, the code between L2006~L2025 should be below:
```c
  /* m1 */
  /* Need ldm1_ >= max(1, (transpose_m1 == 't' ? k : m)) */
  if(m1->stride[(transpose_r == 'n' ? 0 : 1)] == 1 &&
     m1->stride[(transpose_r == 'n' ? 1 : 0)] >= THMax(1, m))
  {
    transpose_m1 = 'n';
    m1_ = m1;
  }
  else if(m1->stride[(transpose_r == 'n' ? 1 : 0)] == 1 &&
          m1->stride[(transpose_r == 'n' ? 0 : 1)] >= THMax(1, k))
  {
    transpose_m1 = 't';
    m1_ = m1;
  }
  else
  {
    transpose_m1 = (transpose_r == 'n' ? 't' : 'n');
    m1_ = THTensor_(newContiguous)(m1);
    free_m1 = 1;
}
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

slow addmm which comes from bug with CPU backend #5047

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

slow addmm which comes from bug with CPU backend #5047

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions