-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[ROCM] Navi21 Enablement 9: Range and Multinomial Kernels #73550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit ad0bca9 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
@micmelesse Thanks for merging master into PR branch. Can you please check the CI failures above for CUDA, just to make sure it's not related to your PR? |
|
@osalpekar @ngimel @malfet This is the final Navi PR. Everything passes. |
|
@ngimel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
malfet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but just curious: why num_threads() is signed integer?
Summary: This PR is a follow up to the following prs. #69942 #72682 #72809 #73543 #73545 #73546 #73548 #73549 We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant. Pull Request resolved: #73550 Reviewed By: malfet Differential Revision: D35444958 Pulled By: ngimel fbshipit-source-id: c65f06d3227c23bb097a71fc6c86e3f884114e04
|
Hey @micmelesse. |
It was that way in the code before and I wanted to keep my changes minimal. |
) Summary: This PR is a follow up to the following prs. pytorch#69942 pytorch#72682 pytorch#72809 pytorch#73543 pytorch#73545 pytorch#73546 pytorch#73548 pytorch#73549 We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant. Pull Request resolved: pytorch#73550 Reviewed By: malfet Differential Revision: D35444958 Pulled By: ngimel fbshipit-source-id: c65f06d3227c23bb097a71fc6c86e3f884114e04 (cherry picked from commit 7f3ba52)
) Summary: This PR is a follow up to the following prs. pytorch#69942 pytorch#72682 pytorch#72809 pytorch#73543 pytorch#73545 pytorch#73546 pytorch#73548 pytorch#73549 We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant. Pull Request resolved: pytorch#73550 Reviewed By: malfet Differential Revision: D35444958 Pulled By: ngimel fbshipit-source-id: c65f06d3227c23bb097a71fc6c86e3f884114e04 (cherry picked from commit 7f3ba52)
This PR is a follow up to the following prs.
#69942
#72682
#72809
#73543
#73545
#73546
#73548
#73549
We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.