Skip to content

[ROCm][CI] Run PR-Based workflow runs on mi300 nodes.#167225

Closed
amdfaa wants to merge 9 commits intopytorch:mainfrom
amdfaa:patch-31
Closed

[ROCm][CI] Run PR-Based workflow runs on mi300 nodes.#167225
amdfaa wants to merge 9 commits intopytorch:mainfrom
amdfaa:patch-31

Conversation

@amdfaa
Copy link
Contributor

@amdfaa amdfaa commented Nov 6, 2025

This PR is meant to swap the PR-based ciflow tags from the mi200 nodes (less stable) to the mi300 nodes (more stable). This will ensure that developers see consistent testing on their PRs as well as on main. This PR does all of the following:

  • Rename rocm.yml to rocm-mi200.yml : for clarity

  • Add ciflow/rocm-mi200 trigger to rocm-mi200.yml : for devs who want to opt-in to single-GPU unit tests on MI200

  • Move ciflow/rocm trigger from rocm-mi200.yml to rocm-mi300.yml : so PRs target MI300 runners by default

  • Rename inductor-rocm.yml to inductor-rocm-mi200.yml : for clarity

  • Remove ciflow/inductor-rocm trigger from inductor-rocm-mi200.yml : prevent MI200 inductor config unit tests being triggered by default

  • Add ciflow/inductor-rocm-mi200 trigger to inductor-rocm-mi200.yml : for devs who want to opt-in to inductor config unit tests on MI200

  • Move ciflow/periodic trigger from periodic-rocm-mi200.yml to periodic-rocm-mi300.yml : so PRs target MI300 runners by default

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

@amdfaa amdfaa requested a review from a team as a code owner November 6, 2025 16:26
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167225

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 326f83c with merge base 9b4ac45 (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: rocm AMD GPU support for Pytorch topic: not user facing topic category labels Nov 6, 2025
@amdfaa amdfaa changed the title [ROCm][CI] Add periodic tag to the mi300 workflow [ROCm][CI] Run PR-Based workflow runs on mi300 nodes. Nov 6, 2025
@jithunnair-amd jithunnair-amd added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm labels Nov 7, 2025
@jithunnair-amd jithunnair-amd requested a review from huydhn November 7, 2025 00:47
Copy link
Contributor

@huydhn huydhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pytorch-bot pytorch-bot bot removed ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm labels Nov 7, 2025
@pytorch-bot pytorch-bot bot added the ciflow/rocm Trigger "default" config CI on ROCm label Nov 7, 2025
@jithunnair-amd
Copy link
Collaborator

@pytorchbot merge -f "Lint and other failures seem unrelated; PR labels triggered MI300 workflows successfully"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jithunnair-amd jithunnair-amd added the ciflow/inductor-rocm-mi300 Trigger "inductor" config CI on ROCm MI300/MI325 label Nov 7, 2025
pytorchmergebot pushed a commit that referenced this pull request Nov 11, 2025
Fixes issue with uploading artifacts, which was inadvertently disabled for some renamed workflows via #167225

Pull Request resolved: #167483
Approved by: https://github.com/jeffdaily
Silv3S pushed a commit to Silv3S/pytorch that referenced this pull request Nov 18, 2025
This PR is meant to swap the PR-based ciflow tags from the mi200 nodes (less stable) to the mi300 nodes (more stable). This will ensure that developers see consistent testing on their PRs as well as on main. This PR does all of the following:

- Rename rocm.yml to rocm-mi200.yml : for clarity
- Add ciflow/rocm-mi200 trigger to rocm-mi200.yml : for devs who want to opt-in to single-GPU unit tests on MI200
- Move ciflow/rocm trigger from rocm-mi200.yml to rocm-mi300.yml : so PRs target MI300 runners by default

- Rename inductor-rocm.yml to inductor-rocm-mi200.yml : for clarity
- Remove ciflow/inductor-rocm trigger from inductor-rocm-mi200.yml : prevent MI200 inductor config unit tests being triggered by default
- Add ciflow/inductor-rocm-mi200 trigger to inductor-rocm-mi200.yml : for devs who want to opt-in to inductor config unit tests on MI200
- Move ciflow/periodic trigger from periodic-rocm-mi200.yml to periodic-rocm-mi300.yml : so PRs target MI300 runners by default

Pull Request resolved: pytorch#167225
Approved by: https://github.com/jeffdaily, https://github.com/huydhn

Co-authored-by: Jithun Nair <jithun.nair@amd.com>
Silv3S pushed a commit to Silv3S/pytorch that referenced this pull request Nov 18, 2025
)

Fixes issue with uploading artifacts, which was inadvertently disabled for some renamed workflows via pytorch#167225

Pull Request resolved: pytorch#167483
Approved by: https://github.com/jeffdaily
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor-rocm-mi300 Trigger "inductor" config CI on ROCm MI300/MI325 ciflow/rocm Trigger "default" config CI on ROCm Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants