Conversation
Contributor
Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] CI: Consolidate the test matrix in a json file
CI: Consolidate test matrix configurations into ci/test-matrix.json
Aug 22, 2025
leofang
reviewed
Aug 22, 2025
…er entry Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
Copilot
AI
changed the title
CI: Consolidate test matrix configurations into ci/test-matrix.json
CI: Consolidate test matrix configurations into ci/test-matrix.json with hard-coded values
Aug 22, 2025
leofang
reviewed
Aug 24, 2025
…re self-hosted runners Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
Copilot
AI
changed the title
CI: Consolidate test matrix configurations into ci/test-matrix.json with hard-coded values
CI: Consolidate test matrix configurations into ci/test-matrix.json with hard-coded values and Windows self-hosted runners
Aug 24, 2025
leofang
reviewed
Aug 24, 2025
Member
|
/ok to test eed0b71 |
leofang
reviewed
Aug 24, 2025
Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
…or future self-hosted migration Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
Copilot
AI
changed the title
CI: Consolidate test matrix configurations into ci/test-matrix.json with hard-coded values, Windows self-hosted runners, proxy cache support, and performance optimizations
CI: Consolidate test matrix configurations into ci/test-matrix.json with hard-coded values, optimized checkout, and prepared Windows self-hosted runner migration
Sep 5, 2025
leofang
approved these changes
Sep 5, 2025
Member
leofang
left a comment
There was a problem hiding this comment.
Now that we confirmed the self-hosted Win runners work, I have reverted temporarily back to GH-hosted runners to make this PR mergeable. I still want the json refactoring to be merged.
Member
|
pre-commit.ci autofix |
Member
|
/ok to test 905ad53 |
Member
|
This is ready. |
cryos
previously approved these changes
Sep 5, 2025
Collaborator
cryos
left a comment
There was a problem hiding this comment.
LGTM, one small nit inline, but I think it is fine.
leofang
reviewed
Sep 5, 2025
Member
|
5 tasks
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 12.9.x
git worktree add -d .worktree/backport-889-to-12.9.x origin/12.9.x
cd .worktree/backport-889-to-12.9.x
git switch --create backport-889-to-12.9.x
git cherry-pick -x b4644f2870afe1ed43e8094c8dfac5a99870702a |
kkraus14
pushed a commit
to kkraus14/cuda-python
that referenced
this pull request
Sep 10, 2025
…ith hard-coded values, optimized checkout, and prepared Windows self-hosted runner migration (NVIDIA#889) * Initial plan * Consolidate test matrices from workflows into ci/test-matrix.json Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Hard-code all GPU and ARCH values in test-matrix.json with 6 fields per entry Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Update Windows test matrix with a100 GPU and latest-1 driver, configure self-hosted runners Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * fix * Revert eed0b71 and change Windows DRIVER from latest-1 to latest Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Add proxy cache setup to Windows workflow for self-hosted runners Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Remove Git for Windows and gh CLI installation steps, add T4 GPU support to Windows matrix Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Set fetch-depth: 1 for checkout steps and favor L4/T4 over A100 GPUs for Windows testing Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Revert Windows workflow to GitHub-hosted runners with TODO comments for future self-hosted migration Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * [pre-commit.ci] auto code formatting * Revert Win runner name change for now --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> Co-authored-by: Leo Fang <leof@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
mdboom
pushed a commit
to mdboom/cuda-python
that referenced
this pull request
Sep 10, 2025
…ith hard-coded values, optimized checkout, and prepared Windows self-hosted runner migration (NVIDIA#889) * Initial plan * Consolidate test matrices from workflows into ci/test-matrix.json Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Hard-code all GPU and ARCH values in test-matrix.json with 6 fields per entry Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Update Windows test matrix with a100 GPU and latest-1 driver, configure self-hosted runners Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * fix * Revert eed0b71 and change Windows DRIVER from latest-1 to latest Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Add proxy cache setup to Windows workflow for self-hosted runners Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Remove Git for Windows and gh CLI installation steps, add T4 GPU support to Windows matrix Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Set fetch-depth: 1 for checkout steps and favor L4/T4 over A100 GPUs for Windows testing Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Revert Windows workflow to GitHub-hosted runners with TODO comments for future self-hosted migration Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * [pre-commit.ci] auto code formatting * Revert Win runner name change for now --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> Co-authored-by: Leo Fang <leof@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
leofang
added a commit
that referenced
this pull request
Sep 11, 2025
* bump all CI jobs to CUDA 12.9.1 * CI: Consolidate test matrix configurations into ci/test-matrix.json with hard-coded values, optimized checkout, and prepared Windows self-hosted runner migration (#889) * Initial plan * Consolidate test matrices from workflows into ci/test-matrix.json Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Hard-code all GPU and ARCH values in test-matrix.json with 6 fields per entry Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Update Windows test matrix with a100 GPU and latest-1 driver, configure self-hosted runners Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * fix * Revert eed0b71 and change Windows DRIVER from latest-1 to latest Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Add proxy cache setup to Windows workflow for self-hosted runners Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Remove Git for Windows and gh CLI installation steps, add T4 GPU support to Windows matrix Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Set fetch-depth: 1 for checkout steps and favor L4/T4 over A100 GPUs for Windows testing Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * Revert Windows workflow to GitHub-hosted runners with TODO comments for future self-hosted migration Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> * [pre-commit.ci] auto code formatting * Revert Win runner name change for now --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> Co-authored-by: Leo Fang <leof@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * forgot to add windows * rerun codegen with 12.9.1 and update result/error explanations * First stab at the filter for CUDA < 13 in CI * Get data from the top-level array * Use the map function on select output * CI: Move to self-hosted Windows GPU runners Migrate the Windows testing to use the new NV GHA runners. Cherry-pick #958. --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: leofang <5534781+leofang@users.noreply.github.com> Co-authored-by: Leo Fang <leof@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marcus D. Hanwell <mhanwell@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR consolidates the hardcoded test matrices from GitHub workflow files into a centralized JSON configuration file, addressing the issue where test matrix data was scattered and embedded within execution logic.
Changes Made
New file:
ci/test-matrix.jsonARCH,PY_VER,CUDA_VER,LOCAL_CTK,GPU,DRIVERUpdated workflows:
.github/workflows/test-wheel-linux.yml: Replaced 58-line hardcoded matrix with JSON-based logic, removed runtime GPU substitution, optimized checkout withfetch-depth: 1, configured self-hosted runners with proxy cache support.github/workflows/test-wheel-windows.yml: Replaced 12-line hardcoded matrix with JSON-based logic, added missing GPU/DRIVER fields, prepared for future self-hosted runner migration with TODO comments, optimized checkout withfetch-depth: 1for compute-matrix jobMatrix Structure
The JSON now contains explicit entries for each architecture combination:
All entries follow the consistent 6-field structure with hard-coded values, eliminating runtime evaluation and improving maintainability.
Self-Hosted Runner Configuration
Linux workflow uses configurable self-hosted runners:
"linux-${{ matrix.ARCH }}-gpu-${{ matrix.GPU }}-${{ matrix.DRIVER }}-1"nv-gha-runners/setup-proxy-cache@mainWindows workflow remains on GitHub-hosted runners with migration preparation:
'cuda-python-windows-gpu-github'"windows-${{ matrix.ARCH }}-gpu-${{ matrix.GPU }}-${{ matrix.DRIVER }}-1"(commented with TODO)Performance Optimizations
fetch-depth: 1for compute-matrix jobs since git history and tags are not neededWindows GPU Configuration
Windows test matrix strategically uses both T4 and L4 GPUs to maintain compatibility while leveraging newer capabilities:
This balanced approach ensures testing coverage across both GPU types while favoring L4/T4 over A100 as recommended.
Migration Strategy
The Windows workflow includes comprehensive TODO comments and prepared configurations for seamless migration to self-hosted runners:
Preserved Functionality
All existing behavior is maintained:
matrix_filterinputBenefits
The implementation maintains complete backward compatibility while significantly improving maintainability, performance, and following the principle of separating data from execution logic. The Windows workflow is fully prepared for future migration to self-hosted runners when they become available.
Fixes #888.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.