PyTorch on ROCm v6.5.0rc (gfx1151 / AMD Strix Halo / Ryzen AI Max+ 395) Detecting Only 15.49GB VRAM Despite 96GB Usable #5152

ashwin3005 · 2025-08-05T05:32:56Z

ashwin3005
Aug 5, 2025

Hi ROCm Team,

I’m running into an issue where PyTorch built for ROCm (v6.5.0rc from [scottt/rocm-TheRock](https://github.com/scottt/rocm-TheRock/releases/tag/v6.5.0rc-pytorch)) on an AMD Strix Halo machine (gfx1151) is only detecting 15.49 GB of VRAM, even though ROCm and rocm-smi report 96GB VRAM available.

❯ System Setup:

Machine: AMD Strix Halo - Ryzen AI Max+ 395 w/ Radeon 8060S
GPU Architecture: gfx1151
Operating System: Ubuntu 24.04.2 LTS (Noble Numbat)
ROCm Version: 6.5.0rc
PyTorch Version: 2.7.0a0+gitbfd8155
Python Environment: Conda (Python 3.11)
Driver Tools Used: rocm-smi, rocminfo, glxinfo

❯ `rocm-smi` VRAM Report:

command:

 rocm-smi --showmeminfo all

output:

============================ ROCm System Management Interface ============================
================================== Memory Usage (Bytes) ==================================
GPU[0]		: VRAM Total Memory (B): 103079215104
GPU[0]		: VRAM Total Used Memory (B): 1403744256
GPU[0]		: VIS_VRAM Total Memory (B): 103079215104
GPU[0]		: VIS_VRAM Total Used Memory (B): 1403744256
GPU[0]		: GTT Total Memory (B): 16633114624
GPU[0]		: GTT Total Used Memory (B): 218669056
==========================================================================================
================================== End of ROCm SMI Log ===================================

❯ `rocminfo` Output Summary:

GPU Agent (gfx1151) reports two global memory pools:

Pool 1:
  Segment: GLOBAL; FLAGS: COARSE GRAINED
  Size:    16243276 KB (~15.49 GB)

Pool 2:
  Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
  Size:    16243276 KB (~15.49 GB)

So from ROCm’s HSA agent side, only about 15.49 GB is visible for each global segment. But rocm-smi and glxinfo show 96 GB as accessible.

❯ `glxinfo`:

command:

glxinfo | grep "Video memory"

output:

Video memory: 98304MB

❯ PyTorch VRAM Check (via `torch.cuda.get_device_properties(0).total_memory`):

Total VRAM: 15.49 GB

❯ Full Python Test Output:

PyTorch version: 2.7.0a0+gitbfd8155
ROCm available: True
Device count: 1
Current device: 0
Device name: AMD Radeon Graphics
Total VRAM: 15.49 GB

❯ Questions / Clarifications:

Why is only ~15.49GB visible to the ROCm HSA layer and PyTorch, when rocm-smi and glxinfo clearly indicate that 96GB is present and usable?
Is there a known limit or configuration flag required to expose full VRAM in an APU (Strix Halo) context?
Are there APU-specific memory visibility constraints in the ROCm runtime (e.g., segment limitations, host-coherent access, IOMMU)?
Does this require a custom build of ROCm or kernel module parameter to fully utilize the unified memory capacity?

Happy to provide any additional logs or test specific builds if needed. This GPU is highly promising for wide range of application. I am in plans to use this to train models.

Thanks for the great work on ROCm so far!

trevorchandler-ike · 2025-08-05T14:30:46Z

trevorchandler-ike
Aug 5, 2025

*** I'm having the exact same issue, can anyone please offer some advice, what to try, what info to grab, etc...? ***

It would be fantastic to not have to use NVidia as the only working option.
Ive been waiting for so l9ng for an AMD option, really looking forward to switching over across the board.

0 replies

lhl · 2025-08-18T02:59:42Z

lhl
Aug 18, 2025

My rocminfo shows 4 pools and all are full sized:

  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    131159480(0x7d155b8) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE

This is using ROCm 6.4.3 where it shows 4 pools and TheRock/ROCm 7.0 nightly where it shows 3 pool. I don't know about your exact setup, but some recommendations:

Use the latest kernel. I'm currently using 6.17.0-rc1-1-mainline but 6.15+ should be fine. Newer kernels have more fixes though
Use the latest linux-firmware - there are some major fixes that were only recently upstreamed, really the more up-to-date the better
ROCm 6.4.1 I believe was the first release to have minimal gfx1151 support. I believe 6.4.2 or 6.4.3 was the first to intro hipblaslt for gfx1151. If your distro doesn't have up-to-date packages, I've found using TheRock/ROCm nightlies (either via tarball or the pip helper) to be the best way to get gfx1151 ROCm support: https://github.com/ROCm/TheRock/blob/main/RELEASES.md

I've been writing up some docs here: https://strixhalo-homelab.d7.wtf/AI/AI-Capabilities-Overview but for advanced usage/WIP notes, you can check some of my original working notes when I was poking around directly: https://llm-tracker.info/_TOORG/Strix-Halo

0 replies

trevorchandler-ike · 2025-08-18T15:38:53Z

trevorchandler-ike
Aug 18, 2025

Leonard, Thank you for this information, we will try the suggestions and report back. At this point, we have PyTorch and Python working without issue and have managed to get QWEN 2.5 VL 32B running g and fine tuning without issue. Our last issue is getting MMDetection running properly. We will report back soon, and if you've heard of anyone getting MMDetection running, please let us know. Thanks again, Trevor Chandler

…

________________________________ From: Leonard ***@***.***> Sent: Sunday, August 17, 2025 9:00:03 PM To: ROCm/ROCm ***@***.***> Cc: Trevor Chandler ***@***.***>; Comment ***@***.***> Subject: Re: [ROCm/ROCm] PyTorch on ROCm v6.5.0rc (gfx1151 / AMD Strix Halo / Ryzen AI Max+ 395) Detecting Only 15.49GB VRAM Despite 96GB Usable (Discussion #5152) You don't often get email from ***@***.*** Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> My rocminfo shows 4 pools and all are full sized: Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 131159480(0x7d155b8) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE This is using ROCm 6.4.3 where it shows 4 pools and TheRock/ROCm 7.0 nightly where it shows 3 pool. I don't know about your exact setup, but some recommendations: * Use the latest kernel. I'm currently using 6.17.0-rc1-1-mainline but 6.15+ should be fine. Newer kernels have more fixes though * Use the latest linux-firmware - there are some major fixes that were only recently upstreamed, really the more up-to-date the better * ROCm 6.4.1 I believe was the first release to have minimal gfx1151 support. I believe 6.4.2 or 6.4.3 was the first to intro hipblaslt for gfx1151. If your distro doesn't have up-to-date packages, I've found using TheRock/ROCm nightlies (either via tarball or the pip helper) to be the best way to get gfx1151 ROCm support: https://github.com/ROCm/TheRock/blob/main/RELEASES.md I've been writing up some docs here: https://strixhalo-homelab.d7.wtf/AI/AI-Capabilities-Overview but for advanced usage/WIP notes, you can check some of my original working notes when I was poking around directly: https://llm-tracker.info/_TOORG/Strix-Halo — Reply to this email directly, view it on GitHub<#5152 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BTH2TXHIPT4MHSI2NFH4YID3OE6THAVCNFSM6AAAAACDD3I35CVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIMJTGU4DKNY>. You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyTorch on ROCm v6.5.0rc (gfx1151 / AMD Strix Halo / Ryzen AI Max+ 395) Detecting Only 15.49GB VRAM Despite 96GB Usable #5152

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PyTorch on ROCm v6.5.0rc (gfx1151 / AMD Strix Halo / Ryzen AI Max+ 395) Detecting Only 15.49GB VRAM Despite 96GB Usable #5152

Uh oh!

ashwin3005 Aug 5, 2025

❯ System Setup:

❯ rocm-smi VRAM Report:

command:

output:

❯ rocminfo Output Summary:

❯ glxinfo:

command:

output:

❯ PyTorch VRAM Check (via torch.cuda.get_device_properties(0).total_memory):

❯ Full Python Test Output:

❯ Questions / Clarifications:

Replies: 3 comments

Uh oh!

trevorchandler-ike Aug 5, 2025

Uh oh!

lhl Aug 18, 2025

Uh oh!

trevorchandler-ike Aug 18, 2025

ashwin3005
Aug 5, 2025

❯ `rocm-smi` VRAM Report:

❯ `rocminfo` Output Summary:

❯ `glxinfo`:

❯ PyTorch VRAM Check (via `torch.cuda.get_device_properties(0).total_memory`):

trevorchandler-ike
Aug 5, 2025

lhl
Aug 18, 2025

trevorchandler-ike
Aug 18, 2025