You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Move abs, frac, reciprocal, and neg to TensorIterator (#19041)
Summary:
I've been messing around with vectorizing the fusion compiler in JIT, and noticed that these ops were pathologically slow. I moved them to use TensorIterator + Vec256<> and got some speed wins.
Benchmark script:
```
import torch, time
ops = ['abs', 'neg', 'reciprocal', 'frac']
x = torch.rand(1024, 1024)
NITER = 10000
print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t')
for op in ops:
s = time.time()
for i in range(NITER):
getattr(x, op)()
elapsed_sec = ((time.time() - s) / NITER)
print(op, elapsed_sec * 1000, (1024*1024/elapsed_sec)/1e9, (1024*1024*4*2) / elapsed_sec / 1e9, sep='\t')
```
Before this change (on my mac with a skylake):
```
op time per iter (ms) gops/s GB/s
abs 0.9730974197387695 1.0775652866097343 8.620522292877874
neg 1.0723679780960083 0.9778136063534356 7.822508850827485
reciprocal 1.2610594034194946 0.8315040490215421 6.6520323921723366
frac 1.1681334018707275 0.8976509004200546 7.181207203360437
```
After this change:
```
op time per iter (ms) gops/s GB/s
abs 0.5031076192855835 2.084198210889721 16.673585687117768
neg 0.4433974027633667 2.3648672578256087 18.91893806260487
reciprocal 0.47145988941192624 2.2241043693195985 17.79283495455679
frac 0.5036592721939087 2.0819154096627024 16.65532327730162
```
So, after this change it looks like we are hitting machine peak for bandwidth and are bandwidth bound.
Pull Request resolved: #19041
Differential Revision: D14862037
Pulled By: jamesr66a
fbshipit-source-id: e2032ac0ca962dbf4120bb36812277c260e22912
0 commit comments