Skip to content

Conversation

@vishwakftw
Copy link
Contributor

@vishwakftw vishwakftw commented May 19, 2019

This PR covers two important points with respect to the QR decomposition:

Changelog:

  • Enable batching for inputs to torch.qr
  • Move QR decomposition implementation to ATen (CPU and CUDA)
  • Remove existing implementations in TH/THC
  • Add a some option to torch.qr that will enable users to switch between complete and reduced decomposition
  • Modify doc strings

Test plan:

  • Add new tests, remove old ones.

Closes #10538

@pytorchbot pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: operators labels May 19, 2019
@pytorchbot pytorchbot added the module: cublas Problem related to cublas support label May 20, 2019
@pytorchbot pytorchbot added the module: pybind Related to our Python bindings / interactions with other Python libraries label May 22, 2019
@pytorchbot pytorchbot added module: docs Related to our documentation, both in docs/ and docblocks module: tests Issues related to tests (not the torch.testing module) labels May 23, 2019
@vishwakftw vishwakftw changed the title [WIP] Enable batched QR decomposition and add a some option Enable batched QR decomposition and add a some option May 23, 2019
@vishwakftw
Copy link
Contributor Author

@pytorchbot rebase this please

void magmaGeqrf<double>(
magma_int_t m, magma_int_t n, double* dA, magma_int_t ldda,
double* tau, double* dT, magma_int_t* info, bool is_v2) {
if (!is_v2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can safely remove the !is_v2 specializations. we stopped supporting magma v1 long ago

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_v2 specialization is to different between geqrf2 and geqrf. I will add a comment about this to avoid confusion.

Copy link
Contributor

@soumith soumith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please ignore my comments.
I reviewed commit-by-commit, expecting that my reviews for each commit will be preserved.
But after reviewing all commits and submitting, only the first commit's comments have been posted.
All of those were addressed by you on later commits.

@soumith
Copy link
Contributor

soumith commented May 28, 2019

@pytorchbot rebase this please

@vishwakftw
Copy link
Contributor Author

@soumith the pending build / tests have completed successfully.

This is the message on the details page: GitHub rate limited us while adding a commit status of success: You have triggered an abuse detection mechanism. Please wait a few minutes before you try again.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request May 29, 2019
Summary:
This PR covers two important points with respect to the QR decomposition:
- batching of input matrices (#7500)
- adding `some` as an option in `torch.qr` akin to NumPy's `mode` option (#10538)

Changelog:
- Enable batching for inputs to `torch.qr`
- Move QR decomposition implementation to ATen (CPU and CUDA)
- Remove existing implementations in TH/THC
- Add a `some` option to `torch.qr` that will enable users to switch between complete and reduced decomposition
- Modify doc strings
Pull Request resolved: pytorch/pytorch#20689

Differential Revision: D15529230

Pulled By: soumith

fbshipit-source-id: 16af82b1d2db8a3a758fa8a5f798d83f5f950efb
@facebook-github-bot
Copy link
Contributor

@soumith merged this pull request in f6ec464.

@vishwakftw vishwakftw deleted the cpu-qr-to-aten branch May 29, 2019 17:17
@kvuong2711
Copy link

kvuong2711 commented Jun 9, 2019

Hi,

How can I use this batched implementation of QR now?

a = torch.randn(3, 4, 5)
q, r = torch.qr(a)

gives me an error RuntimeError: invalid argument 1: A should be 2 dimensional at /pytorch/aten/src/TH/generic/THTensorLapack.cpp:680

@vishwakftw
Copy link
Contributor Author

Please install from source, or a nightly build dated after May 30. This is not available in any of the current releases.

@kvuong2711
Copy link

Hi,

I'm curious about the internal implementation of this batched QR. In a high-level sense, does it make use of a for-loop to loop through all the k tensors of sizes m * n and calculate QR sequentially for each of them, or is it a whole new implementation of batched QR that optimizes the strength of GPU (tensor operations only)? It's terribly slow for millions of 4x4 tensors for me, so I guess that's the former?

@vishwakftw
Copy link
Contributor Author

Yes, it is the former. The GPU implementation depends on a package called MAGMA. They have batched support for some of their operations, and in fact they have a batched version of geqrf which is used to obtain the R matrix and the Householder reflectors to generate Q. Unfortunately it’s not LAPACK compliant (which makes extracting R difficult) which is why I had to use the former method.

@nikhilvyas
Copy link

nikhilvyas commented Oct 6, 2024

@vishwakftw is the statement "batched qr/eigh basically uses a for-loop over all matrices in the batch" still true? Do you know how hard it would be to have a real parallelized implementation? Specifically in the application I have in mind we only need the Q matrices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: cublas Problem related to cublas support module: cuda Related to torch.cuda, and CUDA support in general module: docs Related to our documentation, both in docs/ and docblocks module: internals Related to internal abstractions in c10 and ATen module: pybind Related to our Python bindings / interactions with other Python libraries module: tests Issues related to tests (not the torch.testing module) open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QR option to return full matrices

8 participants