-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Enable batched QR decomposition and add a some option
#20689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
some optionsome option
|
@pytorchbot rebase this please |
| void magmaGeqrf<double>( | ||
| magma_int_t m, magma_int_t n, double* dA, magma_int_t ldda, | ||
| double* tau, double* dT, magma_int_t* info, bool is_v2) { | ||
| if (!is_v2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can safely remove the !is_v2 specializations. we stopped supporting magma v1 long ago
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_v2 specialization is to different between geqrf2 and geqrf. I will add a comment about this to avoid confusion.
soumith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please ignore my comments.
I reviewed commit-by-commit, expecting that my reviews for each commit will be preserved.
But after reviewing all commits and submitting, only the first commit's comments have been posted.
All of those were addressed by you on later commits.
|
@pytorchbot rebase this please |
|
@soumith the pending build / tests have completed successfully. This is the message on the details page: |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This PR covers two important points with respect to the QR decomposition: - batching of input matrices (#7500) - adding `some` as an option in `torch.qr` akin to NumPy's `mode` option (#10538) Changelog: - Enable batching for inputs to `torch.qr` - Move QR decomposition implementation to ATen (CPU and CUDA) - Remove existing implementations in TH/THC - Add a `some` option to `torch.qr` that will enable users to switch between complete and reduced decomposition - Modify doc strings Pull Request resolved: pytorch/pytorch#20689 Differential Revision: D15529230 Pulled By: soumith fbshipit-source-id: 16af82b1d2db8a3a758fa8a5f798d83f5f950efb
|
Hi, How can I use this batched implementation of QR now?
gives me an error RuntimeError: invalid argument 1: A should be 2 dimensional at /pytorch/aten/src/TH/generic/THTensorLapack.cpp:680 |
|
Please install from source, or a nightly build dated after May 30. This is not available in any of the current releases. |
|
Hi, I'm curious about the internal implementation of this batched QR. In a high-level sense, does it make use of a for-loop to loop through all the k tensors of sizes m * n and calculate QR sequentially for each of them, or is it a whole new implementation of batched QR that optimizes the strength of GPU (tensor operations only)? It's terribly slow for millions of 4x4 tensors for me, so I guess that's the former? |
|
Yes, it is the former. The GPU implementation depends on a package called MAGMA. They have batched support for some of their operations, and in fact they have a batched version of geqrf which is used to obtain the R matrix and the Householder reflectors to generate Q. Unfortunately it’s not LAPACK compliant (which makes extracting R difficult) which is why I had to use the former method. |
|
@vishwakftw is the statement "batched qr/eigh basically uses a for-loop over all matrices in the batch" still true? Do you know how hard it would be to have a real parallelized implementation? Specifically in the application I have in mind we only need the Q matrices. |
This PR covers two important points with respect to the QR decomposition:
someas an option intorch.qrakin to NumPy'smodeoption (QR option to return full matrices #10538)Changelog:
torch.qrsomeoption totorch.qrthat will enable users to switch between complete and reduced decompositionTest plan:
Closes #10538