RFC: Should matmuls use tf32 by default?

Since the release of Ampere GPUs, pytorch has been using [tf32](https://pytorch.org/docs/master/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices) by default.  It is providing much better performance at the expense of somewhat lower accuracy. Nvidia has conducted a lot of experiments proving that convergence behavior of a wide variety of networks does not change when tf32 is used instead of regular fp32. 
However, since pytorch is used not only for deep learning workloads, or for some non-standard deep learning workloads, the use of tf32 for matrix multiplication has resulted in [a lot](https://discuss.pytorch.org/t/bug-matmul-seems-to-cast-to-float16-internally/135248) of [confusion](https://github.com/pytorch/pytorch/issues?q=is%3Aissue+label%3A%22module%3A+tf32%22) and sometimes [bad results](https://github.com/pytorch/pytorch/issues/61890).
Going forward, we have a few options
1) Do nothing. Pros: most users get speedups, cons: issues and confusion will continue
2) Leave the default as is, but add warning to the first call using tf32, explicit setting of `allow_tf32` would silence this warning (proposed [here](https://github.com/pytorch/pytorch/issues/67185#issuecomment-951321420). Pros - results will no longer be a surprise, and users will be able to get the expected results without resorting to documentation. Cons - pytorch doesn't warn for normal operations, so this would create bad precedent
3) Use tf32 in nn layers (linear, rnn etc), but disallow it for raw matmul operations (proposed [here](https://github.com/pytorch/pytorch/issues/61890#issuecomment-906728150). Pros - networks using `nn` operations will get speed up while people experimenting with pure matmul operations will still get exact results. Cons - it is really confusing when linear layer that calls `matmul` under the hood produces different results than `matmul` call. Also, if someone defines a network using `matmuls` and not `nn.Linear` or `nn.linear` they won't see the speed-up.
4) Disable tf32 matmul by default, possibly with the warning that it can be enabled. Pros - no surprise, accurate results. Cons - users who are getting speed-ups currently will have to manually enable tf32, or see reduced performance. If there's a warning, same con as option 2). 
In my conversations with power users, they lean towards option 4), we should use this issue to find an acceptable solution. 
cc @zasdfgbnm @ptrblck @CrisHY1995, @ssnl, @stas00, @t-vi, @ptrblck, @csarofeen, @zasdfgbnm, @wjablonski-work, please copy anyone else who is interested in this discussion. 
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Should matmuls use tf32 by default? #67384

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: Should matmuls use tf32 by default? #67384

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions