Skip to content

Fix static launcher 16-bit float scalars#184065

Draft
jansel wants to merge 1 commit into
gh/jansel/727/basefrom
gh/jansel/727/head
Draft

Fix static launcher 16-bit float scalars#184065
jansel wants to merge 1 commit into
gh/jansel/727/basefrom
gh/jansel/727/head

Conversation

@jansel
Copy link
Copy Markdown
Contributor

@jansel jansel commented May 17, 2026

Stack from ghstack (oldest at bottom):

Encode fp16 and bf16 scalar arguments with distinct launcher ABI codes so static and fast CUDA launchers pack them as 16-bit values instead of 32-bit floats. Add regression coverage for fp16 and bf16 scalar launches.

Fixes #158521

Generated by my agent

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 17, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/184065

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 495da93 with merge base 75ab8d6 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 17, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@jansel
Copy link
Copy Markdown
Contributor Author

jansel commented May 17, 2026

@claude review these changes

@claude
Copy link
Copy Markdown

claude Bot commented May 17, 2026

Claude finished @jansel's task in 2m 5s —— View job


Code Review: Fix static launcher 16-bit float scalars

  • Gather context and read changed files
  • Analyze the implementation for correctness
  • Check test coverage
  • Provide review feedback

LGTM — this is a clean, well-scoped fix. The implementation correctly adds distinct ABI codes for fp16 ('e') and bf16 ('y') so the static and fast CUDA launchers pack them as 2-byte values instead of incorrectly treating them as 32-bit floats.

Details

Python mapping (static_triton_launcher.py:165-166): Previously both fp16 and bf16 mapped to 'f' (float32), which caused the C++ side to write a 4-byte float into the kernel argument slot instead of the expected 2-byte half-precision value. Now they map to 'e' and 'y' respectively.

C++ parsing (cuda.cpp and xpu.cpp): The new case 'e' / case 'y' branches use convertType<c10::Half> and convertType<c10::BFloat16> with THPUtils_unpackDouble as the Python converter. The flow is:

  1. THPUtils_unpackDouble extracts a double from the Python object
  2. static_cast<c10::Half>(double_val) narrows through floatHalf (or BFloat16)
  3. The 2-byte result is written to the start of the zero-initialized 8-byte uint64_t slot
  4. CUDA's cuLaunchKernel reads exactly 2 bytes from the kernelArgs pointer — correct behavior

Alignment: Each argument slot is a uint64_t (8-byte aligned), and the pointers in kernelArgs point to the start of each slot. CUDA reads the natural size of the kernel parameter (2 bytes for half types), so the zero-padding in the upper 6 bytes is harmless.

Tests: Good regression coverage for both TestStaticTritonLauncher and TestFastCudaLauncher, testing fp16 and bf16 separately. The bf16 test correctly gates on SM80OrLater. The arg_tys assertions ("Oefd" and "Oyfd") verify the encoding, and the numerical equality checks verify end-to-end correctness.

No issues found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant