-
|
I have a standard FashionMNIST training script that runs pretty well at full precision (fp32). I've run it on Nvidia devices and more recently on the Radeon 8060S (GMKTec Evo-X2). I implemented fp16 mixed precision, trying using both bfloat16 and float16: ( FYI torch.cuda.amp is now deprecated. A warning says to use torch.amp.[...] and include device_type )
This speeds things up quite a bit on my Nvidia devices, so I tried it on my Radeon recently. It runs, but it it actually slower than the fp32 version, and I'm not quite sure why. This is on Ubuntu 24.04.3 with ROCm 7 after following the installs at:
What's more confusing is that the PC came with Windows natively. I had originally used an unofficial PyTorch build with ROCm from TheRock: https://github.com/scottt/rocm-TheRock/releases/tag/v6.5.0rc-pytorch-gfx110x This was pretty good and sped things up similar to Nvidia when using mixed precision. However, there were a number of bugs, it being an unofficial release. Wondering if anyone else has had this problem. If so, are there any red flags? Namely compatibility issues that can arise with certain bad torch version or ROCm versions, even Linux versions (22.04 better?) Would appreciate any tips you may have! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
I solved this problem by installing a nightly PyTorch build specific to my gfx:
Enjoy your new ROCm PyTorch on Windows (experimental) ※(^o^)/※ |
Beta Was this translation helpful? Give feedback.
I solved this problem by installing a nightly PyTorch build specific to my gfx:
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ "rocm[libraries,devel]"if you need ROCm (or another gfx if this is not yours. Userocm-info(if on Linux, you can| grep gfx)pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch torchaudio torchvisionto get the nightly torch build supporting the 8060s's gfxpython -c "import torch; print(torch.cuda.is_available())". It might say CUDA here, but if True it means that the ROCm-compiled torch build was correctly insta…