-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add BFloat16 support for smooth_l1_loss on CPU #62558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 9f09c88 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
Hi! Can you please clarify why we can't use https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h#L154 here? |
|
@VitalyFedyunin Hi! This is because Vectorized<BFloat16> is casted to Vectorized<float> at the beginning, which will reduce the type conversion overhead of intermediate operations. |
|
Overall looks good, I would like to see benchmark numbers for different input sizes. |
a880982 to
45f51e0
Compare
|
Hi~ Rounding error (the greatest difference compared with float) is tested on Intel(R) Core(TM) i7-10700K CPU. In this case, there is no advantage to cast Vecterized<BFloat16> to Vecterized<float> at beginning, but it will reduce rounding errors. |
Codecov Report
@@ Coverage Diff @@
## master #62558 +/- ##
==========================================
+ Coverage 66.37% 66.76% +0.38%
==========================================
Files 739 695 -44
Lines 94299 90736 -3563
==========================================
- Hits 62595 60580 -2015
+ Misses 31704 30156 -1548 |
|
When the input becomes larger, the second method get better optimization. |
956916f to
7bf43bb
Compare
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slowFor more information, please take a look at the CI Flow Wiki. |
|
Rebased @VitalyFedyunin |
436f534 to
2dc36e5
Compare
|
Hi @VitalyFedyunin, could you please review it ? Thank you. |
042a8d5 to
c280eb9
Compare
|
@frank-wei has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
frank-wei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bf16 is casted to float to reduce rounding error and perf looks good.
|
@frank-wei has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Add BFloat16 support for smooth_l1_loss on CPU. Pull Request resolved: #62558 Reviewed By: H-Huang Differential Revision: D34897859 Pulled By: frank-wei fbshipit-source-id: a52138c89852642db78f5f3083d05873f3cdec3a
|
Hey @CaoE. |
Add BFloat16 support for smooth_l1_loss on CPU.