Skip to content

[MPS] Add linalg.householder_product for MPS#166090

Closed
kurtamohler wants to merge 3 commits intogh/kurtamohler/57/basefrom
gh/kurtamohler/57/head
Closed

[MPS] Add linalg.householder_product for MPS#166090
kurtamohler wants to merge 3 commits intogh/kurtamohler/57/basefrom
gh/kurtamohler/57/head

Conversation

@kurtamohler
Copy link
Collaborator

@kurtamohler kurtamohler commented Oct 22, 2025

[ghstack-poisoned]
kurtamohler added a commit that referenced this pull request Oct 22, 2025
@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Oct 22, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166090

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4da24cb with merge base 9038a30 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
kurtamohler added a commit that referenced this pull request Oct 22, 2025
@github-actions
Copy link
Contributor

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.


Caused by:

threadgroup_barrier(mem_flags::mem_threadgroup);

T H_prod_0_to_i_rc =
calc_matmul_rc(H_prod, H, H_stride_r, H_stride_c, m, r, c);
Copy link
Collaborator Author

@kurtamohler kurtamohler Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, performance is much worse than that of the CPU impl, except in some cases where the number of batches is greater than the number of A matrix elements times tau vector elements.

The vast majority of runtime is spent in this matrix multiplication. I'm using a naive implementation of matmul, so we should be able to get much better performance if I change it to a tiled matmul. I suppose it should be possible to just reuse the tiled matmul defined earlier in this file, so I will look into that

Copy link
Collaborator Author

@kurtamohler kurtamohler Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to improve performance (in this branch) by changing the kernel to just generate the householder matrices and use the existing do_metal_bmm for the matrix multiply. It improved performance slightly in some cases, and decreased in others, but overall it didn't make too much of a difference. Maybe the CPU impl isn't actually doing a series of full matrix multiplies and instead uses some simplified formula. I'll have to take a look at the lapack impl. But I guess this is probably somewhat low priority

@malfet malfet added the topic: improvements topic category label Oct 22, 2025
kurtamohler added a commit to kurtamohler/pytorch that referenced this pull request Oct 24, 2025
[ghstack-poisoned]
kurtamohler added a commit that referenced this pull request Oct 24, 2025
@kurtamohler
Copy link
Collaborator Author

I will follow up with a performance improvement PR once I understand how to do it

@kurtamohler
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 24, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/kurtamohler/57/head branch November 24, 2025 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: mps Release notes category topic: improvements topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants