-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Optimize LeakyReLU and PReLU 'forward' functions on the CPU #9206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
apaszke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that's nice. Looks like the vectorization pass can't deal with the original code, but has no issue with the later version: https://godbolt.org/g/j5XJr3 (looks similar across many different compiler versions). It might be due to lack of -ffast-math (one path doesn't use multiplication, the other one does, so it has to be careful).
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssnl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary:
This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread.
```
import os
os.environ['OMP_NUM_THREADS']='1' #Use one CPU thread
import torch, torch.nn as nn, time
def test_net(net,offset):
net.eval()
total=0
with torch.no_grad():
for _ in range(100):
x = torch.randn(100,100,100)+offset
start_time = time.time()
y = net(x)
total+=time.time()-start_time
print(net, total*10, 'ms')
for offset in [-1,0,+1]:
test_net(nn.LeakyReLU(),offset)
test_net(nn.PReLU(),offset)
```
Closes pytorch/pytorch#9206
Reviewed By: yf225
Differential Revision: D8749491
Pulled By: btgraham
fbshipit-source-id: 3db8049dd151c0ba9ae1dd5c05bcc58bcab97e9a
Summary:
This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread.
```
import os
os.environ['OMP_NUM_THREADS']='1' #Use one CPU thread
import torch, torch.nn as nn, time
def test_net(net,offset):
net.eval()
total=0
with torch.no_grad():
for _ in range(100):
x = torch.randn(100,100,100)+offset
start_time = time.time()
y = net(x)
total+=time.time()-start_time
print(net, total*10, 'ms')
for offset in [-1,0,+1]:
test_net(nn.LeakyReLU(),offset)
test_net(nn.PReLU(),offset)
```
Closes pytorch/pytorch#9206
Reviewed By: yf225
Differential Revision: D8749491
Pulled By: btgraham
fbshipit-source-id: 3db8049dd151c0ba9ae1dd5c05bcc58bcab97e9a
…9206) Summary: This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread. ``` import os os.environ['OMP_NUM_THREADS']='1' #Use one CPU thread import torch, torch.nn as nn, time def test_net(net,offset): net.eval() total=0 with torch.no_grad(): for _ in range(100): x = torch.randn(100,100,100)+offset start_time = time.time() y = net(x) total+=time.time()-start_time print(net, total*10, 'ms') for offset in [-1,0,+1]: test_net(nn.LeakyReLU(),offset) test_net(nn.PReLU(),offset) ``` Closes pytorch#9206 Reviewed By: yf225 Differential Revision: D8749491 Pulled By: btgraham fbshipit-source-id: 3db8049dd151c0ba9ae1dd5c05bcc58bcab97e9a
This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread.