[CUDA][Thor] Enable CUTLASS matmuls on Thor by Aidyn-A · Pull Request #164836 · pytorch/pytorch

Aidyn-A · 2025-10-07T13:13:35Z

This PR enables special matmuls on Thor devices. This includes row-wise scaled matmul on fp8 and group gemm on bfloat16.

pytorch-bot · 2025-10-07T13:13:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164836

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 6c569b8 with merge base 2f023bf ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge, unstable) (gh) (#166072)
extension/llm/custom_ops/test_quantized_sdpa.py::SDPATestForCustomQuantizedSDPA::test_sdpa_with_custom_quantized_seq_len_small

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Skylion007 · 2025-10-07T17:42:02Z

aten/src/ATen/native/cuda/GroupMM.cu

+  const bool sm11x = properties != nullptr && properties->major == 11;

-  if (sm10x) {
+  if (sm10x || sm11x) {


Should this also be enabled on sm120x or nah?

As far as I know, sm100 and sm110 are compatible, but sm120 is completely different from those two.

Skylion007 · 2025-10-07T17:42:58Z

aten/src/ATen/native/cuda/GroupMM.cu

+  if (sm10x || sm11x) {
    if (small){
      bf16bf16_grouped_gemm_impl_sm90_sm100<
        cutlass::arch::Sm100,


Wait, don't we need a seperate instationation here with

cutlass::arch::Sm110,

?

Nope, it does not exist in CUTLASS. The cutlass::arch::Sm101 technically exist: https://github.com/NVIDIA/cutlass/blob/a2439551c765c5393aebe557ee75d3a0412d2211/include/cutlass/arch/arch.h#L104-L106
but it is it not used anywhere in CUTLASS. I was not able to compile anything with it.

withdraw until tests are fixed.

Aidyn-A · 2025-10-13T14:03:22Z

@pytorchbot rebase

pytorchmergebot · 2025-10-13T14:04:48Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-10-13T14:04:51Z

Successfully rebased add_sm_110a_to_cutlass_matmuls onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout add_sm_110a_to_cutlass_matmuls && git pull --rebase)

Aidyn-A · 2025-11-18T18:38:08Z

@pytorchbot merge

pytorchmergebot · 2025-11-18T18:40:03Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Aidyn-A self-assigned this Oct 7, 2025

Aidyn-A requested review from eqy and syed-ahmed as code owners October 7, 2025 13:13

Aidyn-A added this to PyTorch + CUDA Oct 7, 2025

pytorchbot added the open source label Oct 7, 2025

eqy added ciflow/trunk Trigger trunk jobs on your pull request matrix multiplication release notes: cuda release notes category labels Oct 7, 2025

Skylion007 previously approved these changes Oct 7, 2025

View reviewed changes

Skylion007 reviewed Oct 7, 2025

View reviewed changes

pytorchmergebot force-pushed the add_sm_110a_to_cutlass_matmuls branch from 5cddc35 to c203e94 Compare October 13, 2025 14:04

soulitzer requested a review from ngimel October 13, 2025 15:35

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 13, 2025

johnnynunez added a commit to dusty-nv/jetson-containers that referenced this pull request Nov 18, 2025

[CUDA][Thor] Enable CUTLASS matmuls on Thor pytorch/pytorch#164836

1d233df

enable CUTLASS matmuls on Thor

6c569b8

Aidyn-A force-pushed the add_sm_110a_to_cutlass_matmuls branch from c203e94 to 6c569b8 Compare November 18, 2025 10:57

ngimel approved these changes Nov 18, 2025

View reviewed changes

pytorchmergebot added the merging label Nov 18, 2025

pytorchmergebot added the Merged label Nov 18, 2025

pytorchmergebot closed this in 5333e51 Nov 18, 2025

github-project-automation bot moved this to Done in PyTorch + CUDA Nov 18, 2025

pytorchmergebot removed the merging label Nov 18, 2025

eee4017 pushed a commit to eee4017/pytorch that referenced this pull request Nov 20, 2025

cherry-pick: [CUDA][Thor] Enable CUTLASS matmuls on Thor pytorch#164836

7dac6bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA][Thor] Enable CUTLASS matmuls on Thor#164836

[CUDA][Thor] Enable CUTLASS matmuls on Thor#164836
Aidyn-A wants to merge 1 commit intopytorch:mainfrom
Aidyn-A:add_sm_110a_to_cutlass_matmuls

Aidyn-A commented Oct 7, 2025

Uh oh!

pytorch-bot bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

Skylion007 Oct 7, 2025

Uh oh!

Aidyn-A Nov 18, 2025

Uh oh!

Skylion007 Oct 7, 2025

Uh oh!

Aidyn-A Nov 18, 2025 •

edited

Loading

Uh oh!

Aidyn-A commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Uh oh!

Aidyn-A commented Nov 18, 2025

Uh oh!

pytorchmergebot commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

Aidyn-A commented Oct 7, 2025

Uh oh!

pytorch-bot bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164836

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Skylion007 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Aidyn-A Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Aidyn-A Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Aidyn-A commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Uh oh!

Aidyn-A commented Nov 18, 2025

Uh oh!

pytorchmergebot commented Nov 18, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pytorch-bot bot commented Oct 7, 2025 •

edited

Loading

Aidyn-A Nov 18, 2025 •

edited

Loading