-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[NNC] fix support for FP16 in CudaCodgen #44209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 4409f54 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 6 times. |
bertmaher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little nervous here because the HalfChecker mutation is going to turn loads of halfs into floats, but the expression tree has been constructed assuming all the arithmetic is half-typed. So this probably ends up working syntactically, but the types of the IR are potentially out of sync. I'm not really sure anything will break as a result of this though, so maybe we can push it on to a TODO stack.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nickgg has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Codecov Report
@@ Coverage Diff @@
## master #44209 +/- ##
==========================================
- Coverage 69.24% 69.24% -0.01%
==========================================
Files 381 381
Lines 47573 47573
==========================================
- Hits 32943 32942 -1
- Misses 14630 14631 +1
Continue to review full report at Codecov.
|
Fixes a bug where FP16 values could be incorrectly cast to a half type that doesn't have a cast operator by inserting the cuda specific cast to float during handling of the Cast node, not as a wrapper around printing Loads and Stores. Two main changes: the HalfChecker now inserts the casts to float explicitly in the IR, and the PrioritizeLoad mutator now consumes both Loads and a Cast which immediately preceded a load.
Tested with test_jit_fuser_te.py and test_tensorexpr.py, plus C++ tests obv.