Skip to content

Conversation

@lly-zero-one
Copy link
Contributor

Summary: 1) avoid the use of item 2) bypass the im2col for 1x1 conv

Test Plan: unit test and perf benchmark to show improvement(WIP)

Differential Revision: D22149067

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22149067

@lly-zero-one lly-zero-one requested review from allwu and ngimel June 20, 2020 00:53
@dr-ci
Copy link

dr-ci bot commented Jun 20, 2020

💊 CI failures summary and remediations

As of commit 12e8094 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: False is not true
  test_mem_leak (__main__.TestProfiler_cuda) 
Checks that there's no memory leak when using profiler with CUDA ... FAIL (7.469s) 
 
====================================================================== 
FAIL [7.469s]: test_mem_leak (__main__.TestProfiler_cuda) 
Checks that there's no memory leak when using profiler with CUDA 
---------------------------------------------------------------------- 
Traceback (most recent call last): 
  File "test_profiler.py", line 42, in test_mem_leak 
    self.assertTrue(max_diff < 100 * 1024) 
AssertionError: False is not true 
 
---------------------------------------------------------------------- 
Ran 1 test in 7.469s 
 
FAILED (failures=1) 
 
Generating XML reports... 
Generated XML report: test-reports\python-unittest\TEST-TestProfiler_cuda-20200624003608.xml 
Traceback (most recent call last): 
  File "run_test.py", line 727, in <module> 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 14 times.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22149067

@lly-zero-one lly-zero-one requested a review from allwu June 23, 2020 22:53
Summary:
Pull Request resolved: pytorch#40324

1) avoid the use of item 2) bypass the im2col for 1x1 conv

Test Plan:
unit test and perf benchmark to show improvement
```
num = 50

N = 1
C = 512
H = 4
W = 4

M = 512
kernel_h = 1
kernel_w = 1
stride_h = 1
stride_w = 1
padding_h = 0
padding_w = 0

X_np = np.random.randn(N, C, H, W).astype(np.float32)
W_np = np.random.randn(M, C, kernel_h, kernel_w).astype(np.float32)
X = torch.from_numpy(X_np)

conv2d_pt = torch.nn.Conv2d(
    C, M, (kernel_h, kernel_w), stride=(stride_h, stride_w),
    padding=(padding_h, padding_w), groups=1, bias=True)

class ConvNet(torch.nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.conv2d = conv2d_pt

    def forward(self, x):
        return self.conv2d(x)

model = ConvNet()

def pt_forward():
    # with torch.autograd.profiler.profile(record_shapes=True) as prof:
    model(X)
    # print(prof.key_averages().table(sort_by="self_cpu_time_total"))

torch._C._set_mkldnn_enabled(False)

t = Timer("pt_forward()", "from __main__ import pt_forward, X")
```
Before the optimization:
pt time = 5.841153813526034
After the optimization:
pt time = 4.513134760782123

Differential Revision: D22149067

fbshipit-source-id: 7532eb9ffc57c9bc6cc3c95964d8d4c698a83ce8
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D22149067

Copy link

@allwu allwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 7b0f867.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants