Skip to content

Remove outdated flaky models and enable deterministic algorithms on ROCm#169024

Closed
jataylo wants to merge 15 commits intopytorch:mainfrom
jataylo:flakiness-deter
Closed

Remove outdated flaky models and enable deterministic algorithms on ROCm#169024
jataylo wants to merge 15 commits intopytorch:mainfrom
jataylo:flakiness-deter

Conversation

@jataylo
Copy link
Collaborator

@jataylo jataylo commented Nov 24, 2025

@jataylo jataylo added ciflow/inductor-periodic ciflow/inductor-perf-test-nightly-rocm-mi300 Trigger inductor perf tests on ROCm MI300 ciflow/inductor-perf-test-nightly-rocm-mi355 Trigger inductor perf tests on ROCm MI355 labels Nov 24, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169024

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 12 Unrelated Failures

As of commit b4c1548 with merge base 25c7201 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm module: dynamo module: rocm AMD GPU support for Pytorch labels Nov 24, 2025
@jataylo
Copy link
Collaborator Author

jataylo commented Nov 28, 2025

I guess we'll need ROCm runners to come back for this...

@jataylo jataylo marked this pull request as ready for review November 28, 2025 11:36
@jataylo
Copy link
Collaborator Author

jataylo commented Dec 1, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch push -f https://github.com/jataylo/pytorch.git pull/169024/head:flakiness-deter returned non-zero exit code 128

remote: The 'AMD' enterprise forbids access via a personal access tokens (classic) if the token's lifetime is greater than 366 days. Please adjust your token's lifetime at the following URL: https://github.com/settings/tokens/779664343
fatal: unable to access 'https://github.com/jataylo/pytorch.git/': The requested URL returned error: 403

Raised by https://github.com/pytorch/pytorch/actions/runs/19819362064

@jeffdaily jeffdaily added the topic: not user facing topic category label Dec 2, 2025
@jataylo
Copy link
Collaborator Author

jataylo commented Dec 9, 2025

The tts_angular and gpt2sequence failures are unrelated, merging this. Rest addressed in #168073

@jataylo
Copy link
Collaborator Author

jataylo commented Dec 9, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 9, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jataylo
Copy link
Collaborator Author

jataylo commented Dec 9, 2025

@pytorchbot merge -f "ROCm unrelated failures in benchmark suite, these failures have snuck in and aren't caused by the PR"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

meta-codesync bot pushed a commit to pytorch/benchmark that referenced this pull request Dec 10, 2025
…OCm (#169024)

Summary:
Remove outdated flaky models and enable deterministic algorithms on ROCm

X-link: pytorch/pytorch#169024
Approved by: https://github.com/jeffdaily

Reviewed By: izaitsevfb

Differential Revision: D88758770

fbshipit-source-id: 8ea515ea1e7e954212da1e3b1dda89c89b87e438

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
skpark-rh pushed a commit to skpark-rh/pytorch that referenced this pull request Dec 10, 2025
…OCm (pytorch#169024)

Remove outdated flaky models and enable deterministic algorithms on ROCm

Pull Request resolved: pytorch#169024
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/inductor-perf-test-nightly-rocm-mi300 Trigger inductor perf tests on ROCm MI300 ciflow/inductor-perf-test-nightly-rocm-mi355 Trigger inductor perf tests on ROCm MI355 ciflow/inductor-periodic ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: dynamo module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants