Speed up half tensors printing (#141927)

malfet · pytorchmergebot · commit e499b46465bc · 2024-12-03T07:01:49.000Z
This PR removes copycast of reduced precision types to float before printing, that was added in #14418 to probably unblock printing when many operations, like `is_nan` and `max` were not supported on CPUs (Reusing old test plan) Before the PR: ```python In [1]: import torch; a = torch.rand(1, 1700, 34, 50, dtype=torch.float16) In [2]: %timeit str(a) 621 μs ± 5.06 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) ``` after the PR ```python In [1]: import torch; a = torch.rand(1, 1700, 34, 50, dtype=torch.float16) In [2]: %timeit str(a) 449 μs ± 2.34 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) ``` Also, this allows one printing 15Gb Metal tensors on 32GB Mac machine: ``` % python3 -c "import torch;print(torch.empty(72250,72250, device='mps', dtype=torch.float16))" tensor([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], device='mps:0', dtype=torch.float16) ``` Before this change it failed with non-descriptive ``` % python3 -c "import torch;print(torch.empty(72250,72250, device='mps', dtype=torch.float16))" Traceback (most recent call last): File "<string>", line 1, in <module> import torch;print(torch.empty(72250,72250, device='mps', dtype=torch.float16)) ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/malfet/git/pytorch/pytorch/torch/_tensor.py", line 568, in __repr__ return torch._tensor_str._str(self, tensor_contents=tensor_contents) ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/malfet/git/pytorch/pytorch/torch/_tensor_str.py", line 708, in _str return _str_intern(self, tensor_contents=tensor_contents) File "/Users/malfet/git/pytorch/pytorch/torch/_tensor_str.py", line 625, in _str_intern tensor_str = _tensor_str(self, indent) File "/Users/malfet/git/pytorch/pytorch/torch/_tensor_str.py", line 339, in _tensor_str self = self.float() RuntimeError: Invalid buffer size: 19.45 GB ``` Convert fp8 dtypes to float16, as float range is an overkill Pull Request resolved: #141927 Approved by: https://github.com/ezyang
diff --git a/torch/_tensor_str.py b/torch/_tensor_str.py
@@ -328,18 +328,14 @@ def _tensor_str(self, indent):
     if self.is_neg():
         self = self.resolve_neg()
 
+    # TODO: Remove me when `masked_select` is implemented for FP8
     if self.dtype in [
-        torch.float16,
-        torch.bfloat16,
         torch.float8_e5m2,
         torch.float8_e5m2fnuz,
         torch.float8_e4m3fn,
         torch.float8_e4m3fnuz,
     ]:
-        self = self.float()
-
-    if self.dtype is torch.complex32:
-        self = self.cfloat()
+        self = self.half()
 
     if self.dtype.is_complex:
         # handle the conjugate bit