Fix static launcher 16-bit float scalars#184065
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/184065
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 495da93 with merge base 75ab8d6 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
@claude review these changes |
|
Claude finished @jansel's task in 2m 5s —— View job Code Review: Fix static launcher 16-bit float scalars
LGTM — this is a clean, well-scoped fix. The implementation correctly adds distinct ABI codes for fp16 ( DetailsPython mapping ( C++ parsing (
Alignment: Each argument slot is a Tests: Good regression coverage for both No issues found. |
Stack from ghstack (oldest at bottom):
Encode fp16 and bf16 scalar arguments with distinct launcher ABI codes so static and fast CUDA launchers pack them as 16-bit values instead of 32-bit floats. Add regression coverage for fp16 and bf16 scalar launches.
Fixes #158521
Generated by my agent
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo