`torch.mm` gives wrong results on certain combinations of input size, device type, and cuBLAS version.

## 🐛 Bug



## To Reproduce

Ensure that you are running on CUDA 9.0 on a Maxwell or Pascal device, and execute the following script:

```python
import torch, sys
print('Torch version:', torch.__version__)
print('sys.version', sys.version)
for n_rows in [
        0b01000000000000000000001,
        0b01000000000000000000010,
        0b10000010010000010001110,
        0b11000000000000000000000
        ]:
    a=torch.ones(n_rows,2).float().cuda()
    b=torch.ones(2,2).float().cuda()
    print((torch.mm(a, b) - 2).abs().max().item())
```

If the script prints anything other than "0.0" on any line, you have reproduced the bug.

To reproduce the issue in pure cuBLAS code (with no reference to PyTorch), see this gist: https://gist.github.com/umanwizard/2b2e2fc12485ef6dc1cdfb1421276dd9


## Environment

The following are known to be necessary conditions for the issue:

1. cuBLAS version is less than 9.2
1. At least one dimension of the matrix is larger than 2^21
1. Device architecture is Maxwell or Pascal
1. Data type is float or half

Other than that, we don't know the exact conditions under which it triggers.


cc @ezyang @gchanan @zou3519

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`torch.mm` gives wrong results on certain combinations of input size, device type, and cuBLAS version. #22078

🐛 Bug

To Reproduce

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

torch.mm gives wrong results on certain combinations of input size, device type, and cuBLAS version. #22078

Description

🐛 Bug

To Reproduce

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`torch.mm` gives wrong results on certain combinations of input size, device type, and cuBLAS version. #22078