Perf improvement of Conv2d and Conv3d #40324

lly-zero-one · 2020-06-20T00:42:31Z

Summary: 1) avoid the use of item 2) bypass the im2col for 1x1 conv

Test Plan: unit test and perf benchmark to show improvement(WIP)

Differential Revision: D22149067

facebook-github-bot · 2020-06-20T00:42:54Z

This pull request was exported from Phabricator. Differential Revision: D22149067

dr-ci · 2020-06-20T01:09:28Z

💊 CI failures summary and remediations

As of commit 12e8094 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: False is not true

  test_mem_leak (__main__.TestProfiler_cuda) 
Checks that there's no memory leak when using profiler with CUDA ... FAIL (7.469s) 
 
====================================================================== 
FAIL [7.469s]: test_mem_leak (__main__.TestProfiler_cuda) 
Checks that there's no memory leak when using profiler with CUDA 
---------------------------------------------------------------------- 
Traceback (most recent call last): 
  File "test_profiler.py", line 42, in test_mem_leak 
    self.assertTrue(max_diff < 100 * 1024) 
AssertionError: False is not true 
 
---------------------------------------------------------------------- 
Ran 1 test in 7.469s 
 
FAILED (failures=1) 
 
Generating XML reports... 
Generated XML report: test-reports\python-unittest\TEST-TestProfiler_cuda-20200624003608.xml 
Traceback (most recent call last): 
  File "run_test.py", line 727, in <module>

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 14 times.

aten/src/ATen/native/ConvolutionMM3d.cpp

facebook-github-bot · 2020-06-23T22:12:52Z

This pull request was exported from Phabricator. Differential Revision: D22149067

Summary: Pull Request resolved: pytorch#40324 1) avoid the use of item 2) bypass the im2col for 1x1 conv Test Plan: unit test and perf benchmark to show improvement ``` num = 50 N = 1 C = 512 H = 4 W = 4 M = 512 kernel_h = 1 kernel_w = 1 stride_h = 1 stride_w = 1 padding_h = 0 padding_w = 0 X_np = np.random.randn(N, C, H, W).astype(np.float32) W_np = np.random.randn(M, C, kernel_h, kernel_w).astype(np.float32) X = torch.from_numpy(X_np) conv2d_pt = torch.nn.Conv2d( C, M, (kernel_h, kernel_w), stride=(stride_h, stride_w), padding=(padding_h, padding_w), groups=1, bias=True) class ConvNet(torch.nn.Module): def __init__(self): super(ConvNet, self).__init__() self.conv2d = conv2d_pt def forward(self, x): return self.conv2d(x) model = ConvNet() def pt_forward(): # with torch.autograd.profiler.profile(record_shapes=True) as prof: model(X) # print(prof.key_averages().table(sort_by="self_cpu_time_total")) torch._C._set_mkldnn_enabled(False) t = Timer("pt_forward()", "from __main__ import pt_forward, X") ``` Before the optimization: pt time = 5.841153813526034 After the optimization: pt time = 4.513134760782123 Differential Revision: D22149067 fbshipit-source-id: 7532eb9ffc57c9bc6cc3c95964d8d4c698a83ce8

facebook-github-bot · 2020-06-23T23:04:20Z

This pull request was exported from Phabricator. Differential Revision: D22149067

allwu

LGTM!

facebook-github-bot · 2020-06-24T08:13:16Z

This pull request has been merged in 7b0f867.

facebook-github-bot added the fb-exported label Jun 20, 2020

lly-zero-one requested review from allwu and ngimel June 20, 2020 00:53

allwu reviewed Jun 22, 2020

View reviewed changes

aten/src/ATen/native/ConvolutionMM3d.cpp Outdated Show resolved Hide resolved

aten/src/ATen/native/ConvolutionMM3d.cpp Outdated Show resolved Hide resolved

lly-zero-one force-pushed the export-D22149067 branch from eb44b13 to d43f670 Compare June 23, 2020 22:12

lly-zero-one requested a review from allwu June 23, 2020 22:53

lly-zero-one force-pushed the export-D22149067 branch from d43f670 to 12e8094 Compare June 23, 2020 23:04

allwu approved these changes Jun 23, 2020

View reviewed changes

facebook-github-bot closed this in 7b0f867 Jun 24, 2020

facebook-github-bot added the merged label Jun 24, 2020

lly-zero-one mentioned this pull request Jul 13, 2020

Optimize grouped conv2d by unfold -> bmm #41167

Closed

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf improvement of Conv2d and Conv3d #40324

Perf improvement of Conv2d and Conv3d #40324

Uh oh!

lly-zero-one commented Jun 20, 2020

Uh oh!

facebook-github-bot commented Jun 20, 2020

Uh oh!

dr-ci bot commented Jun 20, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jun 23, 2020

Uh oh!

facebook-github-bot commented Jun 23, 2020

Uh oh!

allwu left a comment

Uh oh!

facebook-github-bot commented Jun 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Perf improvement of Conv2d and Conv3d #40324

Perf improvement of Conv2d and Conv3d #40324

Uh oh!

Conversation

lly-zero-one commented Jun 20, 2020

Uh oh!

facebook-github-bot commented Jun 20, 2020

Uh oh!

dr-ci bot commented Jun 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_windows_vs2019_py36_cuda10.1_test2 (1/1)

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jun 23, 2020

Uh oh!

facebook-github-bot commented Jun 23, 2020

Uh oh!

allwu left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jun 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dr-ci bot commented Jun 20, 2020 •

edited

Loading