Skip to content

[ROCm][CI] Change rocm periodic workflow label to linux.rocm.gpu.mi250.4#164616

Closed
amdfaa wants to merge 1 commit intopytorch:mainfrom
amdfaa:patch-20
Closed

[ROCm][CI] Change rocm periodic workflow label to linux.rocm.gpu.mi250.4#164616
amdfaa wants to merge 1 commit intopytorch:mainfrom
amdfaa:patch-20

Conversation

@amdfaa
Copy link
Contributor

@amdfaa amdfaa commented Oct 3, 2025

Testing done on this PR: #156491

New label linux.rocm.gpu.mi250.4 uses K8s-based ARC runners.

From "Job Duration (all branches)" table on metrics.pytorch.org:
231 jobs per month * 3H (per job) * 3 shards = 2079H per month / 30 days = 69.3H per day / 3H for periodic job = ~23 Runners (assuming all jobs are run simultaneously) = 12 nodes

image

69.3H for jobs per day automatically assumes we are not running in parallel. If you divide this value by the time for each job then you get the max concurrent number of runners (23 i.e 12 nodes) you would have to handle the average job load per day. However, since all those jobs are usually not concurrently running, we started with 5 nodes (ie. 10 runners) with this label.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

@amdfaa amdfaa requested a review from a team as a code owner October 3, 2025 22:40
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164616

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 654b9b1 with merge base 1894082 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 3, 2025
@amdfaa amdfaa changed the title change workflow label [ROCM] change workflow label Oct 3, 2025
@pytorch-bot pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Oct 3, 2025
@amdfaa amdfaa changed the title [ROCM] change workflow label [ROCm][CI] Change rocm periodic workflow label to linux.rocm.gpu.mi250.4 Oct 3, 2025
@jeffdaily
Copy link
Collaborator

@pytorchbot merge -f "rocm workflow runner label change"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorch-bot pytorch-bot bot added the ciflow/rocm Trigger "default" config CI on ROCm label Oct 6, 2025
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm Trigger "default" config CI on ROCm Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants