[MPS] Support large tensors in `torch.cat` by kurtamohler · Pull Request #164416 · pytorch/pytorch

kurtamohler · 2025-10-01T23:03:34Z

Stack from ghstack (oldest at bottom):

-> [MPS] Support large tensors in torch.cat #164416

Fixes #164415

[ghstack-poisoned]

pytorch-bot · 2025-10-01T23:03:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164416

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm][CI] Machines under the label linux.rocm.gpu.2 are undergoing maintenance.

✅ No Failures

As of commit 20ff360 with merge base 24d69c5 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Fixes #164415 ghstack-source-id: c35e34e Pull-Request: #164416

test/test_mps.py

kurtamohler · 2025-10-01T23:21:54Z

aten/src/ATen/native/mps/operations/Shape.mm


+  has_large_tensor |= isTooLargeForMPSGraph(out);
+
+  if (has_large_tensor) {


I wanted to check whether the alternate implementation works correctly for smaller sizes as well, so I tried temporarily changing this condition to always be true. I ran python -m pytest test/test_mps.py -k output_match_cat and it passed.

But it might be a good idea to cover this in CI too. To do that, we could add a non-public python api (either a global flag or a non-public function) that forces calling the alternate impl even if the tensors are small. Then we could add a test in test_mps.py that runs the opinfo cases for cat using the alternate impl.

But idk, maybe that's overkill. Let me know if it seems like something we'd want to do

Is there a reason not to use it unconditionally? I suspect MPSGraph construction overhead for small tensors is probably significant, and perf for medium sized tensors should be the same.

Sounds good, I'll make a follow-up PR that replaces the MPSGraph impl

kulinseth

LGTM

malfet

Probably looks fine to me, though I think it would be good to have an implementation that is more perf-aware and could completely replace MPSGraph.
I guess to achieve that one needs to have fast-path kernel variants for storage-dense tensors and may be just one flavor that supports type-casts (by doing if condition rather than have all possible permutations of the kernels, see example of the copy kernel)

malfet · 2025-10-08T21:40:27Z

aten/src/ATen/native/mps/operations/Shape.mm


+  has_large_tensor |= isTooLargeForMPSGraph(out);
+
+  if (has_large_tensor) {


Is there a reason not to use it unconditionally? I suspect MPSGraph construction overhead for small tensors is probably significant, and perf for medium sized tensors should be the same.

aten/src/ATen/native/mps/operations/Shape.mm

aten/src/ATen/native/mps/kernels/Shape.metal

[ghstack-poisoned]

Fixes #164415 ghstack-source-id: ec37cce Pull-Request: #164416

kurtamohler · 2025-10-09T17:57:25Z

@pytorchbot merge

pytorchmergebot · 2025-10-09T17:59:33Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-10-09T17:59:53Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-jammy-py3.10-gcc11 / test (docs_test, 1, 1, linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

kurtamohler · 2025-10-13T16:49:20Z

@pytorchbot merge

pytorchmergebot · 2025-10-13T16:51:16Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes pytorch#164415 Pull Request resolved: pytorch#164416 Approved by: https://github.com/malfet

Update

3a6433c

[ghstack-poisoned]

kurtamohler requested review from kulinseth and malfet as code owners October 1, 2025 23:03

kurtamohler added a commit that referenced this pull request Oct 1, 2025

[MPS] Support large tensors in torch.cat

84f69e4

Fixes #164415 ghstack-source-id: c35e34e Pull-Request: #164416

pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Oct 1, 2025

kurtamohler commented Oct 1, 2025

View reviewed changes

test/test_mps.py Show resolved Hide resolved

pytorchbot added the open source label Oct 1, 2025

kurtamohler commented Oct 1, 2025

View reviewed changes

kulinseth reviewed Oct 2, 2025

View reviewed changes

malfet approved these changes Oct 8, 2025

View reviewed changes

Update

20ff360

[ghstack-poisoned]

kurtamohler added a commit that referenced this pull request Oct 9, 2025

[MPS] Support large tensors in torch.cat

66b9d4f

Fixes #164415 ghstack-source-id: ec37cce Pull-Request: #164416

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 9, 2025

pytorchmergebot added the merging label Oct 9, 2025

pytorchmergebot removed the merging label Oct 9, 2025

pytorchmergebot added the merging label Oct 13, 2025

pytorchmergebot closed this in 83cbba8 Oct 13, 2025

pytorchmergebot added Merged and removed merging labels Oct 13, 2025

kurtamohler mentioned this pull request Oct 13, 2025

Remove MPSGraph impl of torch.cat in favor of Metal kernel #165350

Closed

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025

[MPS] Support large tensors in torch.cat (pytorch#164416)

2214701

Fixes pytorch#164415 Pull Request resolved: pytorch#164416 Approved by: https://github.com/malfet

github-actions bot deleted the gh/kurtamohler/55/head branch November 13, 2025 02:17


		has_large_tensor \|= isTooLargeForMPSGraph(out);

		if (has_large_tensor) {

Conversation

kurtamohler commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164416

❗ 1 Active SEVs

✅ No Failures

Uh oh!

Uh oh!

kurtamohler Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

malfet Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

kurtamohler Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

kulinseth left a comment

Choose a reason for hiding this comment

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

malfet Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kurtamohler commented Oct 9, 2025

Uh oh!

pytorchmergebot commented Oct 9, 2025

Merge started

Uh oh!

pytorchmergebot commented Oct 9, 2025

Merge failed

Uh oh!

kurtamohler commented Oct 13, 2025

Uh oh!

pytorchmergebot commented Oct 13, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kurtamohler commented Oct 1, 2025 •

edited

Loading

pytorch-bot bot commented Oct 1, 2025 •

edited

Loading

kurtamohler Oct 1, 2025 •

edited

Loading