Skip to content

[ATen][CUDA] CUTLASS matmuls: add sm_103a flag #162956

Closed
Aidyn-A wants to merge 2 commits intopytorch:mainfrom
Aidyn-A:cuda_cutlass_matmuls_add_sm103a
Closed

[ATen][CUDA] CUTLASS matmuls: add sm_103a flag #162956
Aidyn-A wants to merge 2 commits intopytorch:mainfrom
Aidyn-A:cuda_cutlass_matmuls_add_sm103a

Conversation

@Aidyn-A
Copy link
Collaborator

@Aidyn-A Aidyn-A commented Sep 15, 2025

This PR adds an sm_103a flag for GroupMM and RowwiseScaledMM. Contrary to just #161399, this simply adds the flag as the support for sm_103a matmuls is going to be added to CUTLASS v4.2 (see #161399 (comment)).

cc @ptrblck @msaroufim @eqy @jerryzh168 @manuelcandales @SherlockNoMad @angelayi

@Aidyn-A Aidyn-A requested review from cyyever and drisspg September 15, 2025 11:26
@Aidyn-A Aidyn-A self-assigned this Sep 15, 2025
@Aidyn-A Aidyn-A added module: cuda Related to torch.cuda, and CUDA support in general release notes: cuda release notes category topic: not user facing topic category matrix multiplication module: floatx (formerly float8) For torch.float8_e5m2 and torch.float8_e4m3 and other sub 8-bit float types module: core aten Related to change to the Core ATen opset labels Sep 15, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 15, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162956

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 642b74f with merge base 814ba34 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Aidyn-A Aidyn-A marked this pull request as ready for review September 15, 2025 11:26
@Aidyn-A
Copy link
Collaborator Author

Aidyn-A commented Sep 15, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 15, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_12-cuda12_8-build / build

Details for Dev Infra team Raised by workflow job

Copy link
Collaborator

@eqy eqy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it needs to be disabled for CUDA 12.8 and earlier

@Aidyn-A
Copy link
Collaborator Author

Aidyn-A commented Sep 16, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
This PR adds an `sm_103a` flag for GroupMM and RowwiseScaledMM. Contrary to just pytorch#161399, this simply adds the flag as the support for `sm_103a` matmuls is going to be added to CUTLASS v4.2 (see pytorch#161399 (comment)).

Pull Request resolved: pytorch#162956
Approved by: https://github.com/eqy, https://github.com/Skylion007
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
This PR adds an `sm_103a` flag for GroupMM and RowwiseScaledMM. Contrary to just pytorch#161399, this simply adds the flag as the support for `sm_103a` matmuls is going to be added to CUTLASS v4.2 (see pytorch#161399 (comment)).

Pull Request resolved: pytorch#162956
Approved by: https://github.com/eqy, https://github.com/Skylion007
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
This PR adds an `sm_103a` flag for GroupMM and RowwiseScaledMM. Contrary to just pytorch#161399, this simply adds the flag as the support for `sm_103a` matmuls is going to be added to CUTLASS v4.2 (see pytorch#161399 (comment)).

Pull Request resolved: pytorch#162956
Approved by: https://github.com/eqy, https://github.com/Skylion007
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
This PR adds an `sm_103a` flag for GroupMM and RowwiseScaledMM. Contrary to just pytorch#161399, this simply adds the flag as the support for `sm_103a` matmuls is going to be added to CUTLASS v4.2 (see pytorch#161399 (comment)).

Pull Request resolved: pytorch#162956
Approved by: https://github.com/eqy, https://github.com/Skylion007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request matrix multiplication Merged module: core aten Related to change to the Core ATen opset module: cuda Related to torch.cuda, and CUDA support in general module: floatx (formerly float8) For torch.float8_e5m2 and torch.float8_e4m3 and other sub 8-bit float types open source release notes: cuda release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants