Integrate NVIDIA cuSolver backend into ATen/Linalg (initial implementation for eig/eigval) by johannesz-codes · Pull Request #166715 · pytorch/pytorch

johannesz-codes · 2025-10-31T10:04:06Z

Summary

Adds support for NVIDIA’s cuSolver backend to torch.linalg.eig and torch.linalg.eigvals within the ATen/Linalg framework.

Motivation

Extending PyTorch’s Linalg backends with NVIDIA’s cuSolver enables faster execution of torch.linalg.eig and torch.linalg.eigvals, complementing existing MAGMA and CPU implementations.

The speedup observed on consumer hardware (RTX4070/Ryzen 5700x) is in the order of 2x, with preliminary testing on HPC hardware (H100, EPYC 9454) suggesting up to 10x speedup.

Details

Implements cuSolver support for linalg_eig and linalg_eigvals using the interface described in NVIDIA cuSolver documentation as introduced in CUDA 12.8 CUDA 12.8 release notes
Follows the existing MAGMA backend design, adapting it for cuSolver’s cusolverDnXgeev API.
Integrates with existing eig/eigvals dispatch mechanism.
No automatic CPU↔GPU backend switching. (Happy to discuss)
Verified via existing Linalg test coverage; no new tests introduced in this PR.
Tested successfully against both test_linalg.py including slow test suites.
Tested MAGMA fallback successfully using CUDA 12.4. (observed unrelated test failures)

Impact

Enables much faster solving of eigenvalue problems
Maintains numerical consistency and test stability across backends.
No change to public API or user-facing behavior.

Special thanks to @albanD for prior feedback and discussions regarding the PR and @lezcano for feedback on the related testing PR #166322.

Happy to discuss backend dispatch strategy, results from performance and stability testing can be seen here https://dev-discuss.pytorch.org/

cc @jianyuh @nikitaved @mruberry @walterddr @xwang233 @lezcano @albanD

pytorch-bot · 2025-10-31T10:04:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166715

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm failures during provisioning step due to network issues

⏳ No Failures, 1 Pending

As of commit b3f50e0 with merge base 4316df8 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

johannesz-codes · 2025-10-31T10:08:36Z

@pytorchbot label "topic: not user facing"

pytorch-bot · 2025-10-31T10:13:05Z

Didn't find following labels among repository labels: release notes: performance improvement

johannesz-codes · 2025-10-31T10:18:02Z

@pytorchbot label "release notes: cuda" "module: linear algebra"

aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp

aten/src/ATen/native/BatchLinearAlgebra.cpp

lezcano · 2025-10-31T11:49:30Z

aten/src/ATen/native/BatchLinearAlgebra.cpp


 Tensor& linalg_eigvals_out(const Tensor& input, Tensor& values) {
  squareCheckInputs(input, "linalg.eigvals");
+  TORCH_CHECK(input.isfinite().all().item<bool>(), "torch.linalg.eigvals: input tensor should not contain infs or NaNs.");


Remove as it synchronizes. This is already documented for all ops in https://docs.pytorch.org/docs/stable/notes/numerical_accuracy.html#non-finite-values

should I remove this from linalg_eig_out as well?
Doing so causes test failures in test_linalg.py.

Maybe could be adressed in separate PR?

lezcano · 2025-10-31T11:50:04Z

aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp

+  auto eigenvalues_cpu = eigenvalues.device().is_cpu() ? eigenvalues : eigenvalues.cpu();
+  auto eigenvectors_cpu = eigenvectors.device().is_cpu() ? eigenvectors : eigenvectors.cpu();
+  auto infos_cpu = infos.device().is_cpu() ? infos : infos.cpu();


again, unconditionally call .cpu()

johannesz-codes · 2025-10-31T15:13:17Z

Should I fix the failing lintrunner test? The affected line wasn’t touched in this PR.

lezcano

LGTM. cc @IvanYashchuk in case he wants to have a quick look. If he doesn't answer by monday let's merge it.

Thank you for the contribution! I think this is literally the first OSS contribution to linalg in 2 years :P

lezcano · 2025-10-31T17:06:35Z

regarding the test, if you rebase on viable/strict CI should be green

eqy · 2025-10-31T17:38:14Z

CC @nikitaved

Skylion007 · 2025-10-31T17:56:49Z

This perf improvement is definitely user facing and should have patch notes

Skylion007 · 2025-10-31T18:00:18Z

aten/src/ATen/native/cuda/linalg/CUDASolver.cpp

+      jobvl,
+      jobvr,
+      n,
+      CUDA_R_64F,  // Datentyp der Matrix


Language typo? Comments should be in English for consistency?

Left over from testing. Will remove tomorrow.

nikitaved · 2025-10-31T19:51:10Z

aten/src/ATen/native/BatchLinearAlgebra.cpp

 #include <utility>
 #include <vector>

+#include "jit/tensorexpr/bounds_overlap.h"


What do we need this one for?

don't know how that slipped in there. Removed, compiles and tests fine

Seems there was another stray import in the same commit. Probably auto-import kicked in.

nikitaved · 2025-11-03T13:14:30Z

aten/src/ATen/native/BatchLinearAlgebra.cpp

+        // move tensors to CPU if they are not already there for post-processing
+        auto vectors_cpu = vectors.cpu();
+        auto values_cpu  = values.cpu();
+        auto maybe_complex_vectors_cpu = maybe_complex_vectors.cpu();
+
+        vectors = linalg_eig_make_complex_eigenvectors(vectors_cpu, values_cpu, maybe_complex_vectors_cpu);


nit: let's change the comment to something like "we move to the CPU because linalg_eig_make_complex_eigenvectors requires that. Performance note: this function can be implemented via a TensorIterator -- and then we can avoid explicit host-device sync, IIUC.

By the way, do we need this in the cuSolver path at all?

As far as I understand, yes. CuSolver seems to output the eigenvectors in the same format used by MAGMA/CPU. The NVIDIA docs mention the same column-pair structure for real datatypes, though the wording is a bit unclear: https://docs.nvidia.com/cuda/cusolver/index.html

nikitaved · 2025-11-03T13:16:12Z

aten/src/ATen/native/BatchLinearAlgebra.cpp

+
+        vectors = linalg_eig_make_complex_eigenvectors(vectors_cpu, values_cpu, maybe_complex_vectors_cpu);
+
+        vectors.copy_(vectors_cpu.to(vectors.device()));


Suggested change

vectors.copy_(vectors_cpu.to(vectors.device()));

vectors.copy_(vectors_cpu);

Is sufficient.

Let's do vectors_cpu = linalg_eig_make_complex_eigenvalues(vectors_cpu...). Otherwise having vectors = vectors_cpu (because make complex eignevalues is in-place) looks confusing (I am not sure how operator= is implemented for Tensor objects, and how it is aligned with the clone operation that follows). Do we have tests for this path?

Should be tested in test_eig_compare_backends_cuda_floatXX in test_linalg.py, that is also how I test it. Suggested change tests fine for me.

Ok, but the test ignores the new dispatch strategy. We have to be sure that MAGMA is being dispatched to and tested.

True. Should this just be done via the backends.cuda.preferred_linalg_library() option in python?

Yes, and we can just parametrize the test with backend names.

Makes sense. I currently loop over backends like in test_svd, but parametrizing the test would be the more elegant approach.

Let's at least make a follow-up PR with the test, unless you are able to sneak it in here before the merge happens :)

:D don't want to upset the merge... will do, I will just make a PR with what I have and start converting all the tests with different backends to be properly parametrized in a follow-up.

nikitaved · 2025-11-03T13:16:37Z

aten/src/ATen/native/BatchLinearAlgebra.cpp

+
+

nit: stray lines

nikitaved · 2025-11-03T13:18:47Z

aten/src/ATen/native/BatchLinearAlgebra.cpp

  Tensor vectors;
+  if (!vectors.defined()) {
+    vectors = at::empty({0}, input.options());
+  }


Tensor vectors; already makes it undefined, right? So the if path looks redundant. Or am I missing something?

I think this condition was always true. Removed the if, compiles and tests fine.

…-place

albanD

Awesome. Thanks for the quick update.

albanD · 2025-11-03T14:46:05Z

@pytorchbot merge

pytorchmergebot · 2025-11-03T14:48:03Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@albanD

…ation for eig/eigval) (#166715) ### Summary Adds support for NVIDIA’s cuSolver backend to torch.linalg.eig and torch.linalg.eigvals within the ATen/Linalg framework. ### Motivation Extending PyTorch’s Linalg backends with NVIDIA’s cuSolver enables faster execution of torch.linalg.eig and torch.linalg.eigvals, complementing existing MAGMA and CPU implementations. The speedup observed on consumer hardware (RTX4070/Ryzen 5700x) is in the order of **2x**, with preliminary testing on HPC hardware (H100, EPYC 9454) suggesting **up to 10x speedup**. ### Details - Implements cuSolver support for linalg_eig and linalg_eigvals using the interface described in [NVIDIA cuSolver documentation](https://docs.nvidia.com/cuda/cusolver/index.html#cusolverdnxgeev) as introduced in CUDA 12.8 [CUDA 12.8 release notes](https://docs.nvidia.com/cuda/archive/12.8.0/cuda-toolkit-release-notes/index.html) - Follows the existing MAGMA backend design, adapting it for cuSolver’s cusolverDnXgeev API. - Integrates with existing eig/eigvals dispatch mechanism. - No automatic CPU↔GPU backend switching. (Happy to discuss) - Verified via existing Linalg test coverage; no new tests introduced in this PR. - Tested successfully against both test_linalg.py including slow test suites. - Tested MAGMA fallback successfully using CUDA 12.4. (observed unrelated test failures) ### Impact - Enables much faster solving of eigenvalue problems - Maintains numerical consistency and test stability across backends. - No change to public API or user-facing behavior. Special thanks to @albanD for prior feedback and discussions regarding the PR and @lezcano for feedback on the related testing PR [https://github.com/pytorch/pytorch/pull/166322](https://github.com/pytorch/pytorch/pull/166322). Happy to discuss backend dispatch strategy, results from performance and stability testing can be seen here [https://dev-discuss.pytorch.org/](https://dev-discuss.pytorch.org/t/cusolver-dnxgeev-faster-cuda-eigenvalue-calculations/3248/7) Pull Request resolved: #166715 Approved by: https://github.com/lezcano, https://github.com/albanD

johannesz-codes requested review from Aidyn-A, IvanYashchuk, eqy, lezcano, nikitaved and syed-ahmed as code owners October 31, 2025 10:04

pytorch-bot bot added the topic: not user facing topic category label Oct 31, 2025

pytorchbot added the open source label Oct 31, 2025

pytorch-bot bot added module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul release notes: cuda release notes category labels Oct 31, 2025

lezcano reviewed Oct 31, 2025

View reviewed changes

johannesz-codes requested a review from lezcano October 31, 2025 13:54

lezcano approved these changes Oct 31, 2025

View reviewed changes

lezcano added ciflow/trunk Trigger trunk jobs on your pull request ciflow/slow labels Oct 31, 2025

Skylion007 added release notes: rocm mandatorylabel topic: performance topic category and removed topic: not user facing topic category labels Oct 31, 2025

Skylion007 reviewed Oct 31, 2025

View reviewed changes

lezcano removed the release notes: rocm mandatorylabel label Oct 31, 2025

nikitaved added the ciflow/rocm Trigger "default" config CI on ROCm label Oct 31, 2025

nikitaved reviewed Oct 31, 2025

View reviewed changes

johannesz-codes requested review from Skylion007 and nikitaved November 3, 2025 10:24

nikitaved reviewed Nov 3, 2025

View reviewed changes

johannesz-codes added 4 commits November 3, 2025 14:26

Address review comments: make comment more clear

cbcece4

Address review comments: fix redundant .to() before copy_

ca25210

Address review comments: remove stray lines

c1a20ee

Address review comments: remove redundant if condition

678ddb0

johannesz-codes requested a review from nikitaved November 3, 2025 13:39

johannesz-codes added 2 commits November 3, 2025 15:04

Address review comments: call linalg_eig_make_complex_eigenvectors in…

8802780

…-place

Address lintrunner failure: fix tabs

b3f50e0

johannesz-codes force-pushed the feature/eig-xgeev branch from 1253540 to b3f50e0 Compare November 3, 2025 14:33

albanD approved these changes Nov 3, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 3, 2025

pytorchmergebot added the merging label Nov 3, 2025

pytorchmergebot added the Merged label Nov 3, 2025

pytorchmergebot closed this in f3fa560 Nov 3, 2025

pytorchmergebot removed the merging label Nov 3, 2025

johannesz-codes mentioned this pull request Nov 3, 2025

Patch: Test all available eig/eigvals backends #166889

Closed

nikitaved mentioned this pull request Nov 5, 2025

[feature request][linalg][CUDA] torch.linalg.eig -- remove host-device sync. #167105

Closed

johannesz-codes mentioned this pull request Nov 24, 2025

[2.10] New Feature: cuSOLVER DnXgeev general Eigenvalue Solver Integrated into PyTorch CUDA Backend #168970

Open

johannesz-codes mentioned this pull request Dec 13, 2025

[CI][CUDA][GH100] LinAlg Unit Test Failures #170365

Open

This was referenced Jan 30, 2026

Extend all eig/eigvals cuda tests to all available backends #173913

Closed

Improper batch processing in torch.linalg.eig with cuda #164662

Closed


		vectors = linalg_eig_make_complex_eigenvectors(vectors_cpu, values_cpu, maybe_complex_vectors_cpu);

		vectors.copy_(vectors_cpu.to(vectors.device()));

	vectors.copy_(vectors_cpu.to(vectors.device()));
	vectors.copy_(vectors_cpu);

Conversation

johannesz-codes commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Details

Impact

Uh oh!

pytorch-bot bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166715

❗ 1 Active SEVs

⏳ No Failures, 1 Pending

Uh oh!

johannesz-codes commented Oct 31, 2025

Uh oh!

pytorch-bot bot commented Oct 31, 2025

Uh oh!

johannesz-codes commented Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johannesz-codes commented Oct 31, 2025

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

lezcano commented Oct 31, 2025

Uh oh!

eqy commented Oct 31, 2025

Uh oh!

Skylion007 commented Oct 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikitaved Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikitaved Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikitaved Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

johannesz-codes commented Oct 31, 2025 •

edited

Loading

pytorch-bot bot commented Oct 31, 2025 •

edited

Loading

nikitaved Nov 3, 2025 •

edited

Loading

nikitaved Nov 3, 2025 •

edited

Loading

nikitaved Nov 3, 2025 •

edited

Loading