-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Optimize LayerNorm performance on CPU both forward and backward #35750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 361002a (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 27 times. |
|
Currently, forward path of Results on Xeon 6248, 2x20 cores @ 2.5GHz. Use benchmark to reproduce, Input size:
Fine tuning result on gpt2 language modeling fine tuning on WikiTest2 dataset: ### before
| 591/591 [46:11<00:00, 4.69s/it]
### after
| 591/591 [31:09<00:00, 3.16s/it] |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@VitalyFedyunin @ngimel Would you like to review this PR? Thanks! (Sorry that the internal diff was created by mistake, and please feel free to commandeer it) |
|
@ngimel could you please review this one? |
|
@glaringlee can you please make initial review, thanks |
|
Rebased! Please review @glaringlee @VitalyFedyunin @ngimel. cc @jgong5 |
|
@mingfeima |
|
Hi @glaringlee i am seeing some upstream breakage during the rebase: fatal: reference is not a tree: f015d698006c4a11be15b1ebb75b3b9bb317b914
Unable to checkout 'f015d698006c4a11be15b1ebb75b3b9bb317b914' in submodule path 'third_party/tensorpipe'The last 7 commits from tensorpipe repo mismatch pytorch third_party: Any idea how to solve this? |
@mingfeima please try again, we just fixed it. |
|
@glaringlee thanks for the prompt fix! |
|
@VitalyFedyunin This looks good to me know, please advise. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glaringlee has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@glaringlee merged this pull request in 686705c. |
This PR aims at improving
LayerNormperformance on CPU for both forward and backward.Results on Xeon 6248:
The fine tuning of GPT2 on WikiTest2 dataset time per iteration on dual socket reduced from 4.69s/it to 3.16s/it, 1.48x improvement.