Continue to build nightly CUDA 12.9 for internal by huydhn · Pull Request #163029 · pytorch/pytorch

huydhn · 2025-09-16T00:57:55Z

Revert part of #161916 to continue building CUDA 12.9 nightly

cc @albanD

pytorch-bot · 2025-09-16T00:57:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163029

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Macos CI runners unavailable

✅ You can merge normally! (2 Unrelated Failures)

As of commit 6295362 with merge base 12d7cc5 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

windows-binary-wheel / wheel-py3_10-xpu-build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
windows-binary-wheel / wheel-py3_14t-xpu-build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

malfet

Please mention an issue that answers a deadline on when to revert it, but sure, why not

huydhn · 2025-09-17T00:48:16Z

Please mention an issue that answers a deadline on when to revert it, but sure, why not

I also doubt my sanity in doing this, so let's get this one ready, but not land it unless we really need it. Also add a note here that pytorch/test-infra#7074 needs to be reverted too to build domains on 12.9

atalman · 2025-09-22T17:24:49Z

@huydhn please provide some context on this. Supporting 4 CUDA versions across 3 platforms is quite expensive. Can we only build specific Python version and only Linux ?

huydhn · 2025-09-22T19:45:52Z

@huydhn please provide some context on this. Supporting 4 CUDA versions across 3 platforms is quite expensive. Can we only build specific Python version and only Linux ?

I'm keeping this around in case people ask for this internally (post). From the response so far, I don't think there is enough incentive to land this yet

huydhn · 2025-10-11T00:12:30Z

@pytorchbot rebase

pytorchmergebot · 2025-10-11T00:14:04Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

Signed-off-by: Huy Do <huydhn@gmail.com>

pytorchmergebot · 2025-10-11T00:14:08Z

Successfully rebased continue-build-cu129-for-vllm onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout continue-build-cu129-for-vllm && git pull --rebase)

Signed-off-by: Huy Do <huydhn@gmail.com>

huydhn · 2025-10-11T05:08:44Z

@pytorchbot drci

huydhn · 2025-10-11T08:25:03Z

@pytorchbot merge -f '12.9 build looks ok'

pytorchmergebot · 2025-10-11T08:26:36Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

huydhn · 2025-10-14T19:32:41Z

@pytorchbot --help

pytorch-bot · 2025-10-14T19:32:44Z

PyTorchBot Help

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick} ...

In order to invoke the bot on your PR, include a line that starts with
@pytorchbot anywhere in a comment. That line will form the command; no
multi-line commands are allowed. Some commands may be used on issues as specified below.

Example:
    Some extra context, blah blah, wow this PR looks awesome

    @pytorchbot merge

optional arguments:
  -h, --help            Show this help message and exit.

command:
  {merge,revert,rebase,label,drci,cherry-pick}
    merge               Merge a PR
    revert              Revert a PR
    rebase              Rebase a PR
    label               Add label to a PR
    drci                Update Dr. CI
    cherry-pick         Cherry pick a PR onto a release branch

Merge

usage: @pytorchbot merge [-f MESSAGE | -i] [-ic] [-r [{viable/strict,main}]]

Merge an accepted PR, subject to the rules in .github/merge_rules.json.
By default, this will wait for all required checks (lint, pull) to succeed before merging.

optional arguments:
  -f MESSAGE, --force MESSAGE
                        Merge without checking anything. This requires a reason for auditting purpose, for example:
                        @pytorchbot merge -f 'Minor update to fix lint. Expecting all PR tests to pass'
                        
                        Please use `-f` as last resort, prefer `--ignore-current` to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.
  -i, --ignore-current  Merge while ignoring the currently failing jobs.  Behaves like -f if there are no pending jobs.
  -ic                   Old flag for --ignore-current. Deprecated in favor of -i.
  -r [{viable/strict,main}], --rebase [{viable/strict,main}]
                        Rebase the PR to re run checks before merging.  Accepts viable/strict or main as branch options and will default to viable/strict if not specified.

Revert

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst,autorevert}

Revert a merged PR. This requires that you are a Meta employee.

Example:
  @pytorchbot revert -m="This is breaking tests on trunk. hud.pytorch.org/" -c=nosignal

optional arguments:
  -m MESSAGE, --message MESSAGE
                        The reason you are reverting, will be put in the commit message. Must be longer than 3 words.
  -c {nosignal,ignoredsignal,landrace,weird,ghfirst,autorevert}, --classification {nosignal,ignoredsignal,landrace,weird,ghfirst,autorevert}
                        A machine-friendly classification of the revert reason.

Rebase

usage: @pytorchbot rebase [-s | -b BRANCH]

Rebase a PR. Rebasing defaults to the stable viable/strict branch of pytorch.
Repeat contributor may use this command to rebase their PR.

optional arguments:
  -s, --stable          [DEPRECATED] Rebase onto viable/strict
  -b BRANCH, --branch BRANCH
                        Branch you would like to rebase to

Label

usage: @pytorchbot label labels [labels ...]

Adds label to a PR or Issue [Can be used on Issues]

positional arguments:
  labels  Labels to add to given Pull Request or Issue [Can be used on Issues]

Dr CI

usage: @pytorchbot drci 

Update Dr. CI. Updates the Dr. CI comment on the PR in case it's gotten out of sync with actual CI results.

cherry-pick

usage: @pytorchbot cherry-pick --onto ONTO [--fixes FIXES] -c
                               {regression,critical,fixnewfeature,docs,release}

Cherry pick a pull request onto a release branch for inclusion in a release

optional arguments:
  --onto ONTO, --into ONTO
                        Branch you would like to cherry pick onto (Example: release/2.1)
  --fixes FIXES         Link to the issue that your PR fixes (Example: https://github.com/pytorch/pytorch/issues/110666)
  -c {regression,critical,fixnewfeature,docs,release}, --classification {regression,critical,fixnewfeature,docs,release}
                        A machine-friendly classification of the cherry-pick reason.

huydhn · 2025-10-14T19:56:08Z

@pytorchbot cherry-pick --onto release/2.9 --fixes 'vLLM CUDA 12.9 build' -c release

Revert part of #161916 to continue building CUDA 12.9 nightly Pull Request resolved: #163029 Approved by: https://github.com/malfet (cherry picked from commit 4400c5d)

pytorchbot · 2025-10-14T20:02:05Z

Cherry picking #163029

The cherry pick PR is at #165466 and it is linked with issue vLLM CUDA 12.9 build. The following tracker issues are updated:

[v.2.9.0] Release Tracker #162497 (comment)

Details for Dev Infra team

Raised by workflow job

* Continue to build nightly CUDA 12.9 for internal (#163029) Revert part of #161916 to continue building CUDA 12.9 nightly Pull Request resolved: #163029 Approved by: https://github.com/malfet (cherry picked from commit 4400c5d) * Fix lint Signed-off-by: Huy Do <huydhn@gmail.com> --------- Signed-off-by: Huy Do <huydhn@gmail.com> Co-authored-by: Huy Do <huydhn@gmail.com>

When trying to bring cu129 back in #163029, I mainly looked at #163029 and missed another tweak coming from #162455 I discover this issue when testing aarch64+cu129 builds in https://github.com/pytorch/test-infra/actions/runs/18603342105/job/53046883322?pr=7373. Surprisingly, there is no test running for aarch64 CUDA build from what I see in https://hud.pytorch.org/pytorch/pytorch/commit/79a37055e790482c12bf32e69b28c8e473d0209d. Pull Request resolved: #165794 Approved by: https://github.com/malfet

When trying to bring cu129 back in #163029, I mainly looked at #163029 and missed another tweak coming from #162455 I discover this issue when testing aarch64+cu129 builds in https://github.com/pytorch/test-infra/actions/runs/18603342105/job/53046883322?pr=7373. Surprisingly, there is no test running for aarch64 CUDA build from what I see in https://hud.pytorch.org/pytorch/pytorch/commit/79a37055e790482c12bf32e69b28c8e473d0209d. Pull Request resolved: #165794 Approved by: https://github.com/malfet (cherry picked from commit 9095a9d)

[CD] Apply the fix from #162455 to aarch64+cu129 build (#165794) When trying to bring cu129 back in #163029, I mainly looked at #163029 and missed another tweak coming from #162455 I discover this issue when testing aarch64+cu129 builds in https://github.com/pytorch/test-infra/actions/runs/18603342105/job/53046883322?pr=7373. Surprisingly, there is no test running for aarch64 CUDA build from what I see in https://hud.pytorch.org/pytorch/pytorch/commit/79a37055e790482c12bf32e69b28c8e473d0209d. Pull Request resolved: #165794 Approved by: https://github.com/malfet (cherry picked from commit 9095a9d) Co-authored-by: Huy Do <huydhn@gmail.com>

Revert part of pytorch#161916 to continue building CUDA 12.9 nightly Pull Request resolved: pytorch#163029 Approved by: https://github.com/malfet

…h#165794) When trying to bring cu129 back in pytorch#163029, I mainly looked at pytorch#163029 and missed another tweak coming from pytorch#162455 I discover this issue when testing aarch64+cu129 builds in https://github.com/pytorch/test-infra/actions/runs/18603342105/job/53046883322?pr=7373. Surprisingly, there is no test running for aarch64 CUDA build from what I see in https://hud.pytorch.org/pytorch/pytorch/commit/79a37055e790482c12bf32e69b28c8e473d0209d. Pull Request resolved: pytorch#165794 Approved by: https://github.com/malfet

huydhn requested review from atalman and malfet September 16, 2025 00:57

huydhn requested a review from a team as a code owner September 16, 2025 00:57

huydhn added ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR test-config/default labels Sep 16, 2025

pytorch-bot bot added the topic: not user facing topic category label Sep 16, 2025

huydhn added the skip-pr-sanity-checks label Sep 16, 2025

malfet approved these changes Sep 16, 2025

View reviewed changes

huydhn mentioned this pull request Oct 10, 2025

Proposal to bring back 12.9 wheels #165165

Closed

huydhn added 2 commits October 11, 2025 00:14

Continue to build nightly CUDA 12.9 for internal

4c8e154

Signed-off-by: Huy Do <huydhn@gmail.com>

Another tweak from pytorch#162364

b1f78ae

Signed-off-by: Huy Do <huydhn@gmail.com>

pytorchmergebot force-pushed the continue-build-cu129-for-vllm branch from cc72948 to b1f78ae Compare October 11, 2025 00:14

Fix lint

6295362

Signed-off-by: Huy Do <huydhn@gmail.com>

pytorchmergebot added the merging label Oct 11, 2025

pytorchmergebot closed this in 4400c5d Oct 11, 2025

pytorchmergebot added Merged and removed merging labels Oct 11, 2025

pytorchbot mentioned this pull request Oct 14, 2025

[v.2.9.0] Release Tracker #162497

Closed

huydhn mentioned this pull request Oct 17, 2025

[CD] Apply the fix from #162455 to aarch64+cu129 build #165794

Closed

pytorchbot mentioned this pull request Oct 18, 2025

[CD] Apply the fix from #162455 to aarch64+cu129 build #165819

Merged

huydhn deleted the continue-build-cu129-for-vllm branch December 16, 2025 08:03

Conversation

huydhn commented Sep 16, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163029

❗ 1 Active SEVs

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

huydhn commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atalman commented Sep 22, 2025

Uh oh!

huydhn commented Sep 22, 2025

Uh oh!

huydhn commented Oct 11, 2025

Uh oh!

pytorchmergebot commented Oct 11, 2025

Uh oh!

pytorchmergebot commented Oct 11, 2025

Uh oh!

huydhn commented Oct 11, 2025

Uh oh!

huydhn commented Oct 11, 2025

Uh oh!

pytorchmergebot commented Oct 11, 2025

Merge started

Uh oh!

huydhn commented Oct 14, 2025

Uh oh!

pytorch-bot bot commented Oct 14, 2025

PyTorchBot Help

Merge

Revert

Rebase

Label

Dr CI

cherry-pick

Uh oh!

huydhn commented Oct 14, 2025

Uh oh!

pytorchbot commented Oct 14, 2025

Cherry picking #163029

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

huydhn commented Sep 16, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 16, 2025 •

edited

Loading

huydhn commented Sep 17, 2025 •

edited

Loading