Skip to content

Conversation

@malfet
Copy link
Contributor

@malfet malfet commented Sep 16, 2025

Stack from ghstack (oldest at bottom):

To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with NVIDIA/numba-cuda@6e08c9d

This fix was suggested in #162878 (comment)

[ghstack-poisoned]
@malfet malfet requested a review from a team as a code owner September 16, 2025 21:49
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 16, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163111

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 102 Pending

As of commit 5df64e3 with merge base cfc539f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

malfet added a commit that referenced this pull request Sep 16, 2025
ghstack-source-id: 1458bee
Pull Request resolved: #163111
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Sep 16, 2025
@malfet malfet added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 16, 2025
@malfet malfet marked this pull request as draft September 16, 2025 21:53
@malfet malfet added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Sep 16, 2025
[ghstack-poisoned]
malfet added a commit that referenced this pull request Sep 16, 2025
ghstack-source-id: c9afa12
Pull Request resolved: #163111
[ghstack-poisoned]
malfet added a commit that referenced this pull request Sep 17, 2025
ghstack-source-id: 2479fa4
Pull Request resolved: #163111
[ghstack-poisoned]
malfet added a commit that referenced this pull request Sep 17, 2025
ghstack-source-id: 8670b7d
Pull Request resolved: #163111
@malfet malfet marked this pull request as ready for review September 17, 2025 01:37
@malfet malfet changed the title [DoNotMerge] Test new driver [CI] Update NVIDIA driver to 580.82.07 Sep 17, 2025
- name: Get the workflow type for the current user
id: set-condition
run: |
curr_branch="${{ inputs.curr_branch }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is just a temp change to bypass the recent issue with no-runner-experiment?

Copy link
Contributor

@huydhn huydhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamped to unblocked, the PR needs to be cleaned up before landing


To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with NVIDIA/numba-cuda@6e08c9d


[ghstack-poisoned]
malfet added a commit that referenced this pull request Sep 17, 2025
ghstack-source-id: 6726486
Pull Request resolved: #163111

To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with NVIDIA/numba-cuda@6e08c9d

This fix was suggested in #162878 (comment)


[ghstack-poisoned]
malfet added a commit that referenced this pull request Sep 17, 2025
ghstack-source-id: 86160e8
Pull Request resolved: #163111
@malfet
Copy link
Contributor Author

malfet commented Sep 17, 2025

@pytorchbot merge -f "Lint is green, signal has been green previously"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet
Copy link
Contributor Author

malfet commented Sep 17, 2025

@pytorchbot merge -f "Take two"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@nWEIdia
Copy link
Collaborator

nWEIdia commented Sep 17, 2025

I suppose we need to lock numba version for a while for this patch to successfully apply? [Assuming a different numba version may have slight line number changes for the driver.py file]
Could you please note the numba version that the patch is done against?

mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with NVIDIA/numba-cuda@6e08c9d

This fix was suggested in pytorch#162878 (comment)

Pull Request resolved: pytorch#163111
Approved by: https://github.com/huydhn
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
This reverts commit 16475a8.

Reverted pytorch#163111 on behalf of https://github.com/malfet due to It started to fail now, but worked just fine in PR CI ([comment](pytorch#163111 (comment)))
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with NVIDIA/numba-cuda@6e08c9d

This fix was suggested in pytorch#162878 (comment)

Pull Request resolved: pytorch#163111
Approved by: https://github.com/huydhn
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with NVIDIA/numba-cuda@6e08c9d

This fix was suggested in pytorch#162878 (comment)

Pull Request resolved: pytorch#163111
Approved by: https://github.com/huydhn
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
This reverts commit 16475a8.

Reverted pytorch#163111 on behalf of https://github.com/malfet due to It started to fail now, but worked just fine in PR CI ([comment](pytorch#163111 (comment)))
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with NVIDIA/numba-cuda@6e08c9d

This fix was suggested in pytorch#162878 (comment)

Pull Request resolved: pytorch#163111
Approved by: https://github.com/huydhn
@atalman
Copy link
Contributor

atalman commented Sep 22, 2025

@pytorchbot cherry-pick --onto release/2.9 --fixes "Critical CI fix" -c critical

pytorchbot pushed a commit that referenced this pull request Sep 22, 2025
To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with NVIDIA/numba-cuda@6e08c9d

This fix was suggested in #162878 (comment)

Pull Request resolved: #163111
Approved by: https://github.com/huydhn

(cherry picked from commit 8dbac62)
@pytorchbot
Copy link
Collaborator

Cherry picking #163111

The cherry pick PR is at #163522 and it is linked with issue Critical CI fix. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

atalman pushed a commit that referenced this pull request Sep 22, 2025
[CI] Update NVIDIA driver to `580.82.07` (#163111)

To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with NVIDIA/numba-cuda@6e08c9d

This fix was suggested in #162878 (comment)

Pull Request resolved: #163111
Approved by: https://github.com/huydhn

(cherry picked from commit 8dbac62)

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with NVIDIA/numba-cuda@6e08c9d

This fix was suggested in pytorch#162878 (comment)

Pull Request resolved: pytorch#163111
Approved by: https://github.com/huydhn
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
This reverts commit 16475a8.

Reverted pytorch#163111 on behalf of https://github.com/malfet due to It started to fail now, but worked just fine in PR CI ([comment](pytorch#163111 (comment)))
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
To make CI machines capable of running CUDA-13 tests. Unfortunately, this upgrade regresses NUMBA integration, so live patch it with NVIDIA/numba-cuda@6e08c9d

This fix was suggested in pytorch#162878 (comment)

Pull Request resolved: pytorch#163111
Approved by: https://github.com/huydhn
CSkmd added a commit to CSkmd/pytorch that referenced this pull request Sep 29, 2025
The live patch for numba.cuda introduced in pytorch#163111 causes issues in ROCm CI jobs, which do not use CUDA. This change restricts the patching logic to only run when $BUILD_ENVIRONMENT contains 'cuda'.
CSkmd added a commit to CSkmd/pytorch that referenced this pull request Sep 30, 2025
The patch introduced in pytorch#163111 causes issues in ROCm CI jobs. This change restricts the patching logic to CUDA environments only.
CSkmd added a commit to CSkmd/pytorch that referenced this pull request Sep 30, 2025
The patch introduced in pytorch#163111 caused issues in ROCm environments. This change guards the patching logic to CUDA environments only, thus alleviating ROCm builds.
pytorchmergebot pushed a commit that referenced this pull request Oct 4, 2025
The patch introduced in #163111 caused issues in ROCm environments. This change guards the patching logic to CUDA environments only, thus ameliorating test failures in ROCm environments.
Pull Request resolved: #164607
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
cyyever pushed a commit to cyyever/pytorch that referenced this pull request Oct 4, 2025
)

The patch introduced in pytorch#163111 caused issues in ROCm environments. This change guards the patching logic to CUDA environments only, thus ameliorating test failures in ROCm environments.
Pull Request resolved: pytorch#164607
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
)

The patch introduced in pytorch#163111 caused issues in ROCm environments. This change guards the patching logic to CUDA environments only, thus ameliorating test failures in ROCm environments.
Pull Request resolved: pytorch#164607
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
@github-actions github-actions bot deleted the gh/malfet/523/head branch October 23, 2025 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants