-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[NNC] fix Half conversion of immediates in Cuda backend #45213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will get us float{v[i]} < float{0.} right, but won't the dtype of the compare op still be half? Does that cause problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default implementation of each operator in IRMutator will recreate it with the correct types if child nodes are modified.
I can modify the test to reach into the generated IR and check the dtype of the Max op if you like?
|
Oh, also this is the last blocker for enabling |
bertmaher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for the quick fix.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nickgg has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
💊 CI failures summary and remediationsAs of commit 3dcc84e (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. |
The Cuda HalfChecker casts up all loads and stores of Half to Float, so we do math in Float on the device. It didn't cast up HalfImmediate (ie. constants) so they could insert mixed-size ops. Fix is to do that.