Skip to content

[CUDA] Upgrade cuDNN to 9.15.1 for CUDA 13 builds#169412

Closed
eqy wants to merge 4 commits intopytorch:mainfrom
eqy:cuda13cudnn915
Closed

[CUDA] Upgrade cuDNN to 9.15.1 for CUDA 13 builds#169412
eqy wants to merge 4 commits intopytorch:mainfrom
eqy:cuda13cudnn915

Conversation

@eqy
Copy link
Collaborator

@eqy eqy commented Dec 2, 2025

Opening this PR for testing...

Note that we are proposing 9.15 instead of 9.16 as we have not had sufficient signal on 9.16 internally

NS: Added hacky workaround to install 9.15.1 for torchbench testing

cc @csarofeen @ptrblck @xwang233 @nWEIdia

@eqy eqy requested review from a team and jeffdaily as code owners December 2, 2025 21:28
@eqy eqy added module: cudnn Related to torch.backends.cudnn, and CuDNN support open source ciflow/trunk Trigger trunk jobs on your pull request release notes: cudnn ciflow/h100 ciflow/b200 labels Dec 2, 2025
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Dec 2, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169412

Note: Links to docs will display an error until the docs builds have been completed.

❌ 29 New Failures, 4 Cancelled Jobs, 9 Pending, 1 Unrelated Failure

As of commit 8978cc8 with merge base a01538f (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@eqy eqy changed the title [WIP][CUDA][cuDNN] Upgrade cuDNN to 9.15.1 for CUDA 13 builds [CUDA][cuDNN] Upgrade cuDNN to 9.15.1 for CUDA 13 builds Dec 5, 2025
@eqy
Copy link
Collaborator Author

eqy commented Dec 5, 2025

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased cuda13cudnn915 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cuda13cudnn915 && git pull --rebase)

@tinglvv
Copy link
Collaborator

tinglvv commented Dec 8, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased cuda13cudnn915 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cuda13cudnn915 && git pull --rebase)

@tinglvv
Copy link
Collaborator

tinglvv commented Dec 8, 2025

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased cuda13cudnn915 onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout cuda13cudnn915 && git pull --rebase)

@eqy
Copy link
Collaborator Author

eqy commented Dec 9, 2025

@pytorchmergebot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased cuda13cudnn915 onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout cuda13cudnn915 && git pull --rebase)

@eqy
Copy link
Collaborator Author

eqy commented Dec 9, 2025

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@atalman
Copy link
Contributor

atalman commented Dec 23, 2025

@pytorchmergebot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased cuda13cudnn915 onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout cuda13cudnn915 && git pull --rebase)

@malfet
Copy link
Contributor

malfet commented Dec 23, 2025

@pytorchbot merge -f "Let's try again, all other failures were fixed by #151700, and revert didn't really helped"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@atalman
Copy link
Contributor

atalman commented Dec 23, 2025

@pytorchbot cherry-pick --onto release/2.10 -c critical

@pytorchbot
Copy link
Collaborator

Cherry picking #169412

Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x bfa6f5e0730dead84017e779e02de6cea768ee33 returned non-zero exit code 1

Auto-merging .github/scripts/generate_binary_build_matrix.py
Auto-merging .github/workflows/generated-linux-aarch64-binary-manywheel-nightly.yml
CONFLICT (content): Merge conflict in .github/workflows/generated-linux-aarch64-binary-manywheel-nightly.yml
Auto-merging .github/workflows/generated-linux-binary-manywheel-nightly.yml
CONFLICT (content): Merge conflict in .github/workflows/generated-linux-binary-manywheel-nightly.yml
error: could not apply bfa6f5e0730... [CUDA] Upgrade cuDNN to 9.15.1 for CUDA 13 builds (#169412)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

atalman pushed a commit to atalman/pytorch that referenced this pull request Dec 23, 2025
Opening this PR for testing...

Note that we are proposing 9.15 instead of 9.16 as we have not had sufficient signal on 9.16 internally

NS: Added hacky workaround to install 9.15.1 for torchbench testing

Pull Request resolved: pytorch#169412
Approved by: https://github.com/atalman, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Ting Lu <tingl@nvidia.com>
malfet added a commit that referenced this pull request Jan 6, 2026
[CUDA] Upgrade cuDNN to 9.15.1 for CUDA 13 builds (#169412)

Opening this PR for testing...

Note that we are proposing 9.15 instead of 9.16 as we have not had sufficient signal on 9.16 internally

NS: Added hacky workaround to install 9.15.1 for torchbench testing

Pull Request resolved: #169412
Approved by: https://github.com/atalman, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Ting Lu <tingl@nvidia.com>
krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request Jan 9, 2026
)

Opening this PR for testing...

Note that we are proposing 9.15 instead of 9.16 as we have not had sufficient signal on 9.16 internally

Pull Request resolved: pytorch#169412
Approved by: https://github.com/atalman, https://github.com/malfet
krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request Jan 9, 2026
Opening this PR for testing...

Note that we are proposing 9.15 instead of 9.16 as we have not had sufficient signal on 9.16 internally

NS: Added hacky workaround to install 9.15.1 for torchbench testing

Pull Request resolved: pytorch#169412
Approved by: https://github.com/atalman, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request Jan 9, 2026
krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request Jan 9, 2026
Opening this PR for testing...

Note that we are proposing 9.15 instead of 9.16 as we have not had sufficient signal on 9.16 internally

NS: Added hacky workaround to install 9.15.1 for torchbench testing

Pull Request resolved: pytorch#169412
Approved by: https://github.com/atalman, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Ting Lu <tingl@nvidia.com>
krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request Jan 9, 2026
…9412)"

This reverts commit aadd016.

Reverted pytorch#169412 on behalf of https://github.com/huydhn due to Sorry for reverting the change but there seems to cause an import error running vLLM tests ([comment](pytorch#169412 (comment)))
krastogi-in pushed a commit to krastogi-in/pytorch that referenced this pull request Jan 9, 2026
Opening this PR for testing...

Note that we are proposing 9.15 instead of 9.16 as we have not had sufficient signal on 9.16 internally

NS: Added hacky workaround to install 9.15.1 for torchbench testing

Pull Request resolved: pytorch#169412
Approved by: https://github.com/atalman, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Ting Lu <tingl@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autorevert: disable Disable autorevert for a specific PR ci-no-td Do not run TD on this PR ciflow/b200 ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/h100 ciflow/inductor ciflow/inductor-periodic ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/torchbench ciflow/trunk Trigger trunk jobs on your pull request ciflow/vllm Merged module: cudnn Related to torch.backends.cudnn, and CuDNN support open source release notes: cudnn Reverted test-config/inductor_torchbench topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.