Skip to content

Conversation

@mingfeima
Copy link
Collaborator

@mingfeima mingfeima commented Mar 31, 2020

This PR aims at improving LayerNorm performance on CPU for both forward and backward.

Results on Xeon 6248:

  1. single socket inference 1.14x improvement
  2. single core inference 1.77x improvement
  3. single socket training 6.27x improvement

The fine tuning of GPT2 on WikiTest2 dataset time per iteration on dual socket reduced from 4.69s/it to 3.16s/it, 1.48x improvement.

@dr-ci
Copy link

dr-ci bot commented Mar 31, 2020

💊 CI failures summary and remediations

As of commit 361002a (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 27 times.

@mingfeima
Copy link
Collaborator Author

mingfeima commented Mar 31, 2020

Currently, forward path of LayerNorm is partially vectorized, backward is neither parallelized nor vectorized.

Results on Xeon 6248, 2x20 cores @ 2.5GHz. Use benchmark to reproduce, ./run.sh layernorm.py

Input size: [128,128,1024], Output size: [1024], Unit: ms per iteration (the lower the better)

Input Size original this pr speedup
single socket inference 2.16 1.89 1.14
single core inference 43.44 24.59 1.77
single socket training 42.78 6.82 6.27

Fine tuning result on gpt2 language modeling fine tuning on WikiTest2 dataset:

###  before
| 591/591 [46:11<00:00,  4.69s/it]
### after
| 591/591 [31:09<00:00,  3.16s/it]

@vincentqb vincentqb requested a review from yf225 March 31, 2020 20:10
@vincentqb vincentqb added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 31, 2020
@yf225 yf225 requested review from VitalyFedyunin and ngimel and removed request for yf225 April 2, 2020 02:38
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@yf225 yf225 requested review from yf225 and removed request for yf225 April 2, 2020 02:39
@yf225 yf225 added module: cpu CPU specific problem (e.g., perf, algorithm) module: performance Issues related to performance, either of kernel code or framework glue labels Apr 2, 2020
@yf225
Copy link
Contributor

yf225 commented Apr 2, 2020

@VitalyFedyunin @ngimel Would you like to review this PR? Thanks! (Sorry that the internal diff was created by mistake, and please feel free to commandeer it)

@mingfeima
Copy link
Collaborator Author

@ngimel could you please review this one?

@VitalyFedyunin
Copy link
Contributor

@glaringlee can you please make initial review, thanks

@mingfeima
Copy link
Collaborator Author

Rebased! Please review @glaringlee @VitalyFedyunin @ngimel.

cc @jgong5

@glaringlee
Copy link
Contributor

glaringlee commented Aug 3, 2020

@mingfeima
Please rebase the code, the macos CI test failed, a fix has been pushed into master branch.

@mingfeima
Copy link
Collaborator Author

mingfeima commented Aug 10, 2020

Hi @glaringlee i am seeing some upstream breakage during the rebase:

fatal: reference is not a tree: f015d698006c4a11be15b1ebb75b3b9bb317b914
Unable to checkout 'f015d698006c4a11be15b1ebb75b3b9bb317b914' in submodule path 'third_party/tensorpipe'

The last 7 commits from tensorpipe repo mismatch pytorch third_party:
https://github.com/pytorch/tensorpipe/commits/f015d698006c4a11be15b1ebb75b3b9bb317b914
https://github.com/pytorch/tensorpipe/commits/master

Any idea how to solve this?

@glaringlee
Copy link
Contributor

Hi @glaringlee i am seeing some upstream breakage during the rebase:

fatal: reference is not a tree: f015d698006c4a11be15b1ebb75b3b9bb317b914
Unable to checkout 'f015d698006c4a11be15b1ebb75b3b9bb317b914' in submodule path 'third_party/tensorpipe'

The last 7 commits from tensorpipe repo mismatch pytorch third_party:
https://github.com/pytorch/tensorpipe/commits/f015d698006c4a11be15b1ebb75b3b9bb317b914
https://github.com/pytorch/tensorpipe/commits/master

Any idea how to solve this?

@mingfeima please try again, we just fixed it.

@mingfeima mingfeima requested a review from apaszke as a code owner August 11, 2020 06:00
@mingfeima
Copy link
Collaborator Author

@glaringlee thanks for the prompt fix!

@glaringlee
Copy link
Contributor

@VitalyFedyunin This looks good to me know, please advise.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glaringlee has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@glaringlee merged this pull request in 686705c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: performance Issues related to performance, either of kernel code or framework glue open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants