Skip to content

Conversation

@xuhdev
Copy link
Collaborator

@xuhdev xuhdev commented Sep 22, 2020

Benchmark (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R)
E-2136 CPU @ 3.30GHz):

import timeit
for dtype in ('torch.int64', 'torch.int32', 'torch.int16', 'torch.int8', 'torch.uint8'):
    for n, t in [(10_000, 100000),
                (100_000, 10000)]:
        print(f'torch.bitwise_not(a), numel() == {n} for {t} times, dtype={dtype}')
        print(timeit.timeit('torch.bitwise_not(a)', setup=f'import torch; a = torch.arange(-{n//2}, {n//2}, dtype={dtype})', number=t))

Before:

torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int64
0.5479081739904359
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int64
0.3350257440470159
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int32
0.39590477803722024
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int32
0.25563537096604705
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int16
0.31152817397378385
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int16
0.20817365101538599
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int8
0.8573925020173192
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int8
0.4150037349900231
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.uint8
0.8551108679967001
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.uint8
0.37137620500288904

After:

torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int64
0.5232444299617782
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int64
0.33852163201663643
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int32
0.3931163849774748
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int32
0.24392802000511438
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int16
0.3122224889229983
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int16
0.1977886479580775
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int8
0.26711542706470937
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int8
0.18208567495457828
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.uint8
0.2615354140289128
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.uint8
0.17972210398875177

Benchmark (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R)
E-2136 CPU @ 3.30GHz):

```python
import timeit
for dtype in ('torch.int64', 'torch.int32', 'torch.int16', 'torch.int8', 'torch.uint8'):
    for n, t in [(10_000, 100000),
                (100_000, 10000)]:
        print(f'torch.bitwise_not(a), numel() == {n} for {t} times, dtype={dtype}')
        print(timeit.timeit('torch.bitwise_not(a)', setup=f'import torch; a = torch.arange(-{n//2}, {n//2}, dtype={dtype})', number=t))
```

Before:

```
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int64
0.5479081739904359
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int64
0.3350257440470159
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int32
0.39590477803722024
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int32
0.25563537096604705
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int16
0.31152817397378385
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int16
0.20817365101538599
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int8
0.8573925020173192
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int8
0.4150037349900231
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.uint8
0.8551108679967001
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.uint8
0.37137620500288904
```

After:

```
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int64
0.5232444299617782
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int64
0.33852163201663643
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int32
0.3931163849774748
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int32
0.24392802000511438
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int16
0.3122224889229983
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int16
0.1977886479580775
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.int8
0.26711542706470937
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.int8
0.18208567495457828
torch.bitwise_not(a), numel() == 10000 for 100000 times, dtype=torch.uint8
0.2615354140289128
torch.bitwise_not(a), numel() == 100000 for 10000 times, dtype=torch.uint8
0.17972210398875177
```
@xuhdev xuhdev added module: cpu CPU specific problem (e.g., perf, algorithm) module: operators labels Sep 22, 2020
@codecov
Copy link

codecov bot commented Sep 22, 2020

Codecov Report

Merging #45103 into master will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #45103   +/-   ##
=======================================
  Coverage   67.85%   67.85%           
=======================================
  Files         384      384           
  Lines       50026    50026           
=======================================
+ Hits        33944    33945    +1     
+ Misses      16082    16081    -1     
Impacted Files Coverage Δ
torch/testing/_internal/expecttest.py 78.57% <0.00%> (+1.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20f52cd...15522eb. Read the comment docs.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 536580e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants