[ROCm] Prune old gfx archs gfx900/gfx906 from binaries #142827

jithunnair-amd · 2024-12-11T01:33:36Z

Remove gfx900 and gfx906 archs as they're long-in-the-tooth. Should help reduce the increasing size of ROCm binaries.

cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

pytorch-bot · 2024-12-11T01:33:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142827

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Network outage on ROCm runners

❌ 1 New Failure, 3 Unrelated Failures

As of commit f6b347e with merge base 2b105de ():

NEW FAILURE - The following job has failed:

Build manywheel docker images for s390x / build-docker-cpu-s390x (gh)
no space left on device

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

rocm / linux-focal-rocm6.2-py3.10 / test (default, 1, 6, linux.rocm.gpu.2) (gh) (#143231)
Action 'https://api.github.com/repos/pytorch/pytorch/tarball/1d3b0108a6962b3fe113ce93eff1f4836b009c6d' download has timed out. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
rocm / linux-focal-rocm6.2-py3.10 / test (default, 4, 6, linux.rocm.gpu.2) (gh) (#143231)
Action 'https://api.github.com/repos/pytorch/pytorch/tarball/1d3b0108a6962b3fe113ce93eff1f4836b009c6d' download has timed out. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
rocm / linux-focal-rocm6.2-py3.10 / test (default, 6, 6, linux.rocm.gpu.2) (gh) (#143231)
Action 'https://api.github.com/repos/pytorch/pytorch/tarball/1d3b0108a6962b3fe113ce93eff1f4836b009c6d' download has timed out. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jithunnair-amd · 2024-12-12T05:30:52Z

Libtorch and manywheel docker images built successfully with PYTORCH_ROCM_ARCH not containing gfx900 or gfx906.

jithunnair-amd · 2024-12-12T05:31:38Z

@pytorchbot merge -f "Unrelated CI failures". ROCm manywheel/libtorch docker images built successfully"

pytorch-bot · 2024-12-12T05:31:40Z

❌ 🤖 pytorchbot command failed:

Got EOF while in a quoted string```
Try `@pytorchbot --help` for more info.

jithunnair-amd · 2024-12-12T05:31:54Z

@pytorchbot merge -f "Unrelated CI failures. ROCm manywheel/libtorch docker images built successfully"

pytorchmergebot · 2024-12-12T05:33:22Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

EwoutH · 2024-12-13T22:16:02Z

I can’t say anything other than that I’m disappointed AMD doesn’t want to compete with Nvidia on software support.

Support your GPUs for 8+ years, like Nvidia does, including gfx906 GPUs ROCm/ROCm#3893

jeffdaily · 2024-12-13T22:45:33Z

@pytorchbot revert

pytorch-bot · 2024-12-13T22:45:36Z

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -m/--message, -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

jeffdaily · 2024-12-13T22:46:33Z

@pytorchbot revert -m "prematurely dropped support for gfx900/gfx906" -c weird

pytorchmergebot · 2024-12-13T22:48:38Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2024-12-13T22:48:47Z

@jithunnair-amd your PR has been successfully reverted.

…)" This reverts commit 1e2b841. Reverted #142827 on behalf of https://github.com/jeffdaily due to prematurely dropped support for gfx900/gfx906 ([comment](#142827 (comment)))

IMbackK · 2024-12-18T16:31:47Z

@jeffdaily @jithunnair-amd maybe pursue using offload compression support added to llvm recently as an alternative?

FlorianHeigl · 2025-02-01T22:22:12Z

suggestion: generally try focus a bit on things other than forcibly reducing your user base.

jeffdaily · 2025-02-03T20:44:57Z

#143986 added --offload-compress to our builds to help reduce our binary size without removing gfx arches.

There is effort underway to support generic targets, as well.

snarkyalyx · 2025-09-28T12:31:56Z

@jithunnair-amd Who cares about the increasing size of ROCm binaries? Have you listened to your users?

People want long-term support so that prosumer and consumer use can be fulfilled. The compute is there, 6700XT is a modern card, and you shouldn't be ending support so soon - especially since this is an argument for people to switch to NVIDIA, since they supported Kepler GPUs (released 2012) till September 2024.

Prune old gfx archs from binaries

a796c55

pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch topic: not user facing topic category labels Dec 11, 2024

jithunnair-amd added 2 commits December 10, 2024 19:34

Update build.sh

94a5b72

Update build.sh

f6b347e

pytorchbot added the open source label Dec 11, 2024

jeffdaily approved these changes Dec 11, 2024

View reviewed changes

jithunnair-amd marked this pull request as ready for review December 12, 2024 05:30

jithunnair-amd changed the title ~~[ROCm] Prune old gfx archs from binaries~~ [ROCm] Prune old gfx archs gfx900/gfx906 from binaries Dec 12, 2024

pytorchmergebot added the merging label Dec 12, 2024

pytorchmergebot closed this in 1e2b841 Dec 12, 2024

pytorchmergebot added Merged and removed merging labels Dec 12, 2024

jithunnair-amd deleted the prune_old_rocm_archs branch December 12, 2024 05:35

jeffdaily mentioned this pull request Dec 13, 2024

ROCM 6.2.4 RuntimeError: HIP error: AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA to enable device-side assertions. #143219

Closed

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Dec 13, 2024

LunNova mentioned this pull request Dec 16, 2024

[Feature]: Intermediate bytecode to allow for forward compat, reduced build times and smaller binaries ROCm/ROCm#3985

Open

[ROCm] Prune old gfx archs gfx900/gfx906 from binaries #142827

[ROCm] Prune old gfx archs gfx900/gfx906 from binaries #142827

Uh oh!

Conversation

jithunnair-amd commented Dec 11, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142827

❗ 1 Active SEVs

❌ 1 New Failure, 3 Unrelated Failures

Uh oh!

jithunnair-amd commented Dec 12, 2024

Uh oh!

jithunnair-amd commented Dec 12, 2024

Uh oh!

pytorch-bot bot commented Dec 12, 2024

Uh oh!

jithunnair-amd commented Dec 12, 2024

Uh oh!

pytorchmergebot commented Dec 12, 2024

Merge started

Uh oh!

EwoutH commented Dec 13, 2024

Uh oh!

jeffdaily commented Dec 13, 2024

Uh oh!

pytorch-bot bot commented Dec 13, 2024

Uh oh!

jeffdaily commented Dec 13, 2024

Uh oh!

pytorchmergebot commented Dec 13, 2024

Uh oh!

pytorchmergebot commented Dec 13, 2024

Uh oh!

IMbackK commented Dec 18, 2024

Uh oh!

FlorianHeigl commented Feb 1, 2025

Uh oh!

jeffdaily commented Feb 3, 2025

Uh oh!

snarkyalyx commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

jithunnair-amd commented Dec 11, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Dec 11, 2024 •

edited

Loading

snarkyalyx commented Sep 28, 2025 •

edited

Loading