Skip to content

Conversation

@btgraham
Copy link

@btgraham btgraham commented Jul 6, 2018

This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread.

import os
os.environ['OMP_NUM_THREADS']='1'  #Use one CPU thread
import torch, torch.nn as nn, time
def test_net(net,offset):
    net.eval()
    total=0
    with torch.no_grad():
        for _ in range(100):
            x = torch.randn(100,100,100)+offset
            start_time = time.time()
            y = net(x)
            total+=time.time()-start_time
    print(net, total*10, 'ms')

for offset in [-1,0,+1]:
    test_net(nn.LeakyReLU(),offset) 
    test_net(nn.PReLU(),offset) 

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's nice. Looks like the vectorization pass can't deal with the original code, but has no issue with the later version: https://godbolt.org/g/j5XJr3 (looks similar across many different compiler versions). It might be due to lack of -ffast-math (one path doesn't use multiplication, the other one does, so it has to be careful).

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssnl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 7, 2018
Summary:
This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread.

```
import os
os.environ['OMP_NUM_THREADS']='1'  #Use one CPU thread
import torch, torch.nn as nn, time
def test_net(net,offset):
    net.eval()
    total=0
    with torch.no_grad():
        for _ in range(100):
            x = torch.randn(100,100,100)+offset
            start_time = time.time()
            y = net(x)
            total+=time.time()-start_time
    print(net, total*10, 'ms')

for offset in [-1,0,+1]:
    test_net(nn.LeakyReLU(),offset)
    test_net(nn.PReLU(),offset)
```
Closes pytorch/pytorch#9206

Reviewed By: yf225

Differential Revision: D8749491

Pulled By: btgraham

fbshipit-source-id: 3db8049dd151c0ba9ae1dd5c05bcc58bcab97e9a
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 13, 2018
Summary:
This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread.

```
import os
os.environ['OMP_NUM_THREADS']='1'  #Use one CPU thread
import torch, torch.nn as nn, time
def test_net(net,offset):
    net.eval()
    total=0
    with torch.no_grad():
        for _ in range(100):
            x = torch.randn(100,100,100)+offset
            start_time = time.time()
            y = net(x)
            total+=time.time()-start_time
    print(net, total*10, 'ms')

for offset in [-1,0,+1]:
    test_net(nn.LeakyReLU(),offset)
    test_net(nn.PReLU(),offset)
```
Closes pytorch/pytorch#9206

Reviewed By: yf225

Differential Revision: D8749491

Pulled By: btgraham

fbshipit-source-id: 3db8049dd151c0ba9ae1dd5c05bcc58bcab97e9a
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
…9206)

Summary:
This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread.

```
import os
os.environ['OMP_NUM_THREADS']='1'  #Use one CPU thread
import torch, torch.nn as nn, time
def test_net(net,offset):
    net.eval()
    total=0
    with torch.no_grad():
        for _ in range(100):
            x = torch.randn(100,100,100)+offset
            start_time = time.time()
            y = net(x)
            total+=time.time()-start_time
    print(net, total*10, 'ms')

for offset in [-1,0,+1]:
    test_net(nn.LeakyReLU(),offset)
    test_net(nn.PReLU(),offset)
```
Closes pytorch#9206

Reviewed By: yf225

Differential Revision: D8749491

Pulled By: btgraham

fbshipit-source-id: 3db8049dd151c0ba9ae1dd5c05bcc58bcab97e9a
@ssnl ssnl mentioned this pull request Sep 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants