Skip to content

Build vLLM nightly wheels for CUDA 13.0#163239

Closed
huydhn wants to merge 14 commits intopytorch:mainfrom
huydhn:vllm-wheel-cuda13
Closed

Build vLLM nightly wheels for CUDA 13.0#163239
huydhn wants to merge 14 commits intopytorch:mainfrom
huydhn:vllm-wheel-cuda13

Conversation

@huydhn
Copy link
Contributor

@huydhn huydhn commented Sep 18, 2025

Now that vllm-project/vllm#24599 has been merged

Signed-off-by: Huy Do <huydhn@gmail.com>
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163239

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 63e37c3 with merge base a260163 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Aidyn-A
Copy link
Collaborator

Aidyn-A commented Sep 18, 2025

Hmm... These segmentation faults are annoying:

2025-09-18T03:56:20.8919447Z #21 260.0 sh: line 1:  1716 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_00000199_00000000-6_flash_fwd_hdim128_bf16_sm100.compute_90a.ptx" -o "/tmp/tmpxft_00000199_00000000-11_flash_fwd_hdim128_bf16_sm100.compute_90a.cubin" > /tmp/tmpxft_00000199_00000000-13_189d18d0_stdout 2> /tmp/tmpxft_00000199_00000000-13_189d18d0_stderr
...
2025-09-18T03:56:26.6301630Z #21 265.7 sh: line 1:  1766 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_0000019c_00000000-6_flash_fwd_hdim128_bf16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_0000019c_00000000-11_flash_fwd_hdim128_bf16_sm90.compute_90a.cubin" > /tmp/tmpxft_0000019c_00000000-13_403da8f0_stdout 2> /tmp/tmpxft_0000019c_00000000-13_403da8f0_stderr
...
2025-09-18T03:56:35.8654592Z #21 275.0 sh: line 1:  1813 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_000001b1_00000000-6_flash_fwd_hdim128_fp16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_000001b1_00000000-11_flash_fwd_hdim128_fp16_sm90.compute_90a.cubin" > /tmp/tmpxft_000001b1_00000000-13_3bfe3ad0_stdout 2> /tmp/tmpxft_000001b1_00000000-13_3bfe3ad0_stderr
...
2025-09-18T03:58:44.1437362Z #21 403.4 sh: line 1:  2262 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_00000600_00000000-6_flash_fwd_hdim192_128_bf16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_00000600_00000000-11_flash_fwd_hdim192_128_bf16_sm90.compute_90a.cubin" > /tmp/tmpxft_00000600_00000000-13_344c4aa0_stdout 2> /tmp/tmpxft_00000600_00000000-13_344c4aa0_stderr
...
2025-09-18T03:58:53.9488721Z #21 413.2 sh: line 1:  2280 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_00000679_00000000-6_flash_fwd_hdim192_128_fp16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_00000679_00000000-11_flash_fwd_hdim192_128_fp16_sm90.compute_90a.cubin" > /tmp/tmpxft_00000679_00000000-13_1bcf8520_stdout 2> /tmp/tmpxft_00000679_00000000-13_1bcf8520_stderr
...
2025-09-18T03:59:43.9530305Z #21 463.1 sh: line 1:  2325 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_00000769_00000000-6_flash_fwd_hdim192_bf16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_00000769_00000000-11_flash_fwd_hdim192_bf16_sm90.compute_90a.cubin" > /tmp/tmpxft_00000769_00000000-13_2d628f0_stdout 2> /tmp/tmpxft_00000769_00000000-13_2d628f0_stderr

One noticeable fact is that they are all failing on sm_90a.

@huydhn
Copy link
Contributor Author

huydhn commented Sep 18, 2025

Yeah, they are coming from compiling xformers https://github.com/facebookresearch/xformers/releases/tag/v0.0.32.post2 on aarch64. I don't know that the issue is about yet, so appreciate any thoughts you have in mind

@Aidyn-A
Copy link
Collaborator

Aidyn-A commented Sep 18, 2025

I have not encountered segfaults like that, but my first action would be decreasing MAX_JOBS because those CUTLASS kernels are extremely compile-hungry.

Signed-off-by: Huy Do <huydhn@gmail.com>
@huydhn
Copy link
Contributor Author

huydhn commented Sep 18, 2025

I have not encountered segfaults like that, but my first action would be decreasing MAX_JOBS because those CUTLASS kernels are extremely compile-hungry.

Ohh, you're spot on, it works after I lower MAX_JOBS I spoke too soon, CI hasn't been run yet because of the merge conflicts, thus the green CI signals >_<

Signed-off-by: Huy Do <huydhn@gmail.com>
@huydhn
Copy link
Contributor Author

huydhn commented Sep 20, 2025

This is currently blocked by a segfault on ptxas -arch=sm_90a that @Aidyn-A discovered. We have seen this only on aarch64, but x86 might be affected too. Maybe I could try my luck and skip aarch64 build for now

@huydhn
Copy link
Contributor Author

huydhn commented Sep 23, 2025

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/main pull/163239/head returned non-zero exit code 1

Rebasing (1/2)
Auto-merging .github/ci_commit_pins/vllm.txt
CONFLICT (content): Merge conflict in .github/ci_commit_pins/vllm.txt
error: could not apply 82df8a8a0ee... Build vLLM nightly wheels for CUDA 13.0
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 82df8a8a0ee... # Build vLLM nightly wheels for CUDA 13.0

Raised by https://github.com/pytorch/pytorch/actions/runs/17938711036

@ptrblck
Copy link
Collaborator

ptrblck commented Sep 24, 2025

Yeah, they are coming from compiling xformers...

@huydhn do we know if flash-attn is also built as part of xformers? If so, this fix might be needed: https://github.com/Dao-AILab/flash-attention/pull/1860/files

@johnnynunez
Copy link
Contributor

fixed: facebookresearch/xformers#1337
cc @Aidyn-A

@huydhn
Copy link
Contributor Author

huydhn commented Sep 26, 2025

Thank @johnnynunez for the fix! And yes, xformers builds flash-attn

@johnnynunez
Copy link
Contributor

johnnynunez commented Oct 3, 2025

@ptrblck @huydhn all PRs necessary for vllm cuda 13, were merged in public vllm(including flash-attention and blackwell family + cutlass v4.2.1), now only missing is facebookresearch/xformers#1337 I think that it is not merged yet because i was poiting to 2.9.0 and cuda 13.0 failing tests because not exists yet

@huydhn
Copy link
Contributor Author

huydhn commented Oct 12, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/163239/head returned non-zero exit code 1

Rebasing (1/2)
Auto-merging .github/ci_commit_pins/vllm.txt
CONFLICT (content): Merge conflict in .github/ci_commit_pins/vllm.txt
error: could not apply 82df8a8a0ee... Build vLLM nightly wheels for CUDA 13.0
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 82df8a8a0ee... # Build vLLM nightly wheels for CUDA 13.0

Raised by https://github.com/pytorch/pytorch/actions/runs/18439392278

Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
@huydhn huydhn requested a review from atalman October 12, 2025 06:52
@huydhn huydhn marked this pull request as ready for review October 12, 2025 06:53
@huydhn huydhn requested a review from a team as a code owner October 12, 2025 06:53
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
@Aidyn-A
Copy link
Collaborator

Aidyn-A commented Oct 13, 2025

I do not see Build vLLM wheels / Build cu130 vLLM wheel on manylinux_2_28_aarch64 in the CI, was it skipped?

matrix:
platform: [ 'manylinux_2_28_x86_64', 'manylinux_2_28_aarch64' ]
device: [ 'cu128', 'cu129' ]
device: [ 'cu128', 'cu129', 'cu130' ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really care about cu128 here?

Copy link
Contributor Author

@huydhn huydhn Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really I think, this is just to be in sync with PyTorch. I will clean 12.8 up later once 2.9 is out and vLLM is updated to 2.9 + CUDA 12.9 officially

@huydhn
Copy link
Contributor Author

huydhn commented Oct 15, 2025

I do not see Build vLLM wheels / Build cu130 vLLM wheel on manylinux_2_28_aarch64 in the CI, was it skipped?

Yeah, I think I will circle back on this once 2.9 is out. The xformers FA build failure is still there, one less moving thing. Let me know if that makes sense to you

@johnnynunez
Copy link
Contributor

I do not see Build vLLM wheels / Build cu130 vLLM wheel on manylinux_2_28_aarch64 in the CI, was it skipped?

Yeah, I think I will circle back on this once 2.9 is out. The xformers FA build failure is still there, one less moving thing. Let me know if that makes sense to you

now that index cu130 is public, they can run this: facebookresearch/xformers#1344

@huydhn
Copy link
Contributor Author

huydhn commented Oct 16, 2025

@pytorchbot merge -f 'vLLM builds are ok'

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants