Optimize LayerNorm performance on CPU both forward and backward #35750

mingfeima · 2020-03-31T06:03:05Z

This PR aims at improving LayerNorm performance on CPU for both forward and backward.

Results on Xeon 6248:

single socket inference 1.14x improvement
single core inference 1.77x improvement
single socket training 6.27x improvement

The fine tuning of GPT2 on WikiTest2 dataset time per iteration on dual socket reduced from 4.69s/it to 3.16s/it, 1.48x improvement.

dr-ci · 2020-03-31T06:04:19Z

💊 CI failures summary and remediations

As of commit 361002a (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 27 times.

mingfeima · 2020-03-31T06:12:22Z

Currently, forward path of LayerNorm is partially vectorized, backward is neither parallelized nor vectorized.

Results on Xeon 6248, 2x20 cores @ 2.5GHz. Use benchmark to reproduce, ./run.sh layernorm.py

Input size: [128,128,1024], Output size: [1024], Unit: ms per iteration (the lower the better)

Input Size	original	this pr	speedup
single socket inference	2.16	1.89	1.14
single core inference	43.44	24.59	1.77
single socket training	42.78	6.82	6.27

Fine tuning result on gpt2 language modeling fine tuning on WikiTest2 dataset:

###  before
| 591/591 [46:11<00:00,  4.69s/it]
### after
| 591/591 [31:09<00:00,  3.16s/it]

facebook-github-bot

@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

yf225 · 2020-04-02T02:46:51Z

@VitalyFedyunin @ngimel Would you like to review this PR? Thanks! (Sorry that the internal diff was created by mistake, and please feel free to commandeer it)

mingfeima · 2020-06-30T08:51:32Z

@ngimel could you please review this one?

VitalyFedyunin · 2020-07-22T22:13:50Z

@glaringlee can you please make initial review, thanks

mingfeima · 2020-07-28T02:13:59Z

Rebased! Please review @glaringlee @VitalyFedyunin @ngimel.

cc @jgong5

aten/src/ATen/cpu/vec256/functional.h

aten/src/ATen/native/cpu/layer_norm_kernel.cpp

glaringlee · 2020-08-03T20:07:09Z

@mingfeima
Please rebase the code, the macos CI test failed, a fix has been pushed into master branch.

mingfeima · 2020-08-10T08:29:55Z

Hi @glaringlee i am seeing some upstream breakage during the rebase:

fatal: reference is not a tree: f015d698006c4a11be15b1ebb75b3b9bb317b914
Unable to checkout 'f015d698006c4a11be15b1ebb75b3b9bb317b914' in submodule path 'third_party/tensorpipe'

The last 7 commits from tensorpipe repo mismatch pytorch third_party:
https://github.com/pytorch/tensorpipe/commits/f015d698006c4a11be15b1ebb75b3b9bb317b914
https://github.com/pytorch/tensorpipe/commits/master

Any idea how to solve this?

glaringlee · 2020-08-10T13:48:31Z

Hi @glaringlee i am seeing some upstream breakage during the rebase:
fatal: reference is not a tree: f015d698006c4a11be15b1ebb75b3b9bb317b914
Unable to checkout 'f015d698006c4a11be15b1ebb75b3b9bb317b914' in submodule path 'third_party/tensorpipe'
The last 7 commits from tensorpipe repo mismatch pytorch third_party:
https://github.com/pytorch/tensorpipe/commits/f015d698006c4a11be15b1ebb75b3b9bb317b914
https://github.com/pytorch/tensorpipe/commits/master

Any idea how to solve this?

@mingfeima please try again, we just fixed it.

mingfeima · 2020-08-11T06:09:49Z

@glaringlee thanks for the prompt fix!

glaringlee · 2020-08-11T14:36:09Z

@VitalyFedyunin This looks good to me know, please advise.

facebook-github-bot

@glaringlee has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-08-12T18:29:48Z

@glaringlee merged this pull request in 686705c.

pytorchbot added the open source label Mar 31, 2020

vincentqb requested a review from yf225 March 31, 2020 20:10

vincentqb added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 31, 2020

yf225 requested review from VitalyFedyunin and ngimel and removed request for yf225 April 2, 2020 02:38

facebook-github-bot reviewed Apr 2, 2020

View reviewed changes

yf225 requested review from yf225 and removed request for yf225 April 2, 2020 02:39

yf225 added module: cpu CPU specific problem (e.g., perf, algorithm) module: performance Issues related to performance, either of kernel code or framework glue labels Apr 2, 2020

mingfeima force-pushed the layer_norm branch from d303307 to 1153fd9 Compare June 30, 2020 06:05

glaringlee self-requested a review July 23, 2020 00:26

mingfeima force-pushed the layer_norm branch from 1153fd9 to 99a70e2 Compare July 28, 2020 02:12

glaringlee requested changes Jul 29, 2020

View reviewed changes

mingfeima force-pushed the layer_norm branch from 709a865 to f6bddd6 Compare August 10, 2020 07:12

mingfeima added 4 commits August 11, 2020 13:35

optimize LayerNorm on CPU both forward and backward

f43d025

change reduce_all signature and use at::MemoryFormat::Contiguous

86357ac

minor change

741f106

add assert for thread id check

361002a

mingfeima requested a review from apaszke as a code owner August 11, 2020 06:00

mingfeima requested review from mrshenli, pietern, pritamdamania87 and zhaojuanmao as code owners August 11, 2020 06:00

mingfeima force-pushed the layer_norm branch from c2dba7c to 361002a Compare August 11, 2020 06:06

VitalyFedyunin approved these changes Aug 11, 2020

View reviewed changes

facebook-github-bot reviewed Aug 11, 2020

View reviewed changes

glaringlee approved these changes Aug 11, 2020

View reviewed changes

facebook-github-bot closed this in 686705c Aug 12, 2020

facebook-github-bot added the merged label Aug 12, 2020

mruberry added the Merged label Oct 28, 2020

Optimize LayerNorm performance on CPU both forward and backward #35750

Optimize LayerNorm performance on CPU both forward and backward #35750

Uh oh!

Conversation

mingfeima commented Mar 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Mar 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

mingfeima commented Mar 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

yf225 commented Apr 2, 2020

Uh oh!

mingfeima commented Jun 30, 2020

Uh oh!

VitalyFedyunin commented Jul 22, 2020

Uh oh!

mingfeima commented Jul 28, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glaringlee commented Aug 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mingfeima commented Aug 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glaringlee commented Aug 10, 2020

Uh oh!

mingfeima commented Aug 11, 2020

Uh oh!

glaringlee commented Aug 11, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 12, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

mingfeima commented Mar 31, 2020 •

edited

Loading

dr-ci bot commented Mar 31, 2020 •

edited

Loading

mingfeima commented Mar 31, 2020 •

edited

Loading

glaringlee commented Aug 3, 2020 •

edited

Loading

mingfeima commented Aug 10, 2020 •

edited

Loading