-
Notifications
You must be signed in to change notification settings - Fork 26.3k
PyTorch AMD Build Scripts #6625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @Jorghi12, does this mean we are ready to setup AMD builders on CI? Also, directory structure bikeshedding: can we put this inside |
amd_build/build_pytorch_amd.py
Outdated
| import subprocess | ||
| import os | ||
|
|
||
| cwd = os.getcwd() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
amd_build/build_pytorch_amd.py
Outdated
|
|
||
| cwd = os.getcwd() | ||
| proj_dir = os.path.dirname(cwd) | ||
| out_dir = os.path.join(os.path.dirname(proj_dir), "pytorch_amd") |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
amd_build/build_pytorch_amd.py
Outdated
| shutil.copy(os.path.join(cwd, "hip_files/THC/generic/THCTensorRandom.cu.hip"), os.path.join(out_dir, "aten/src/THC/generic/THCTensorRandom.cu")) | ||
|
|
||
| # Move to avoid HCC bug. | ||
| shutil.move(os.path.join(out_dir, "aten/src/ATen/native/cudnn/Conv.cpp"), os.path.join(out_dir, "aten/src/ATen/native/cudnn/ConvCuDNN.cpp")) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
amd_build/build_pytorch_amd.py
Outdated
|
|
||
| # Execute the Hipify Script. | ||
| subprocess.Popen( | ||
| ["/opt/rocm/bin/hipify-python.py", |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| @@ -0,0 +1,454 @@ | |||
| CMAKE_MINIMUM_REQUIRED(VERSION 2.8) | |||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| @@ -0,0 +1,878 @@ | |||
| #ifndef THC_APPLY_INC | |||
| #define THC_APPLY_INC | |||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
Looking at the patch that the build script applies to master, it seems sufficiently big that the next time someone merges a change to anything in THC, it will immediately stop working. In general, I am wondering, would it make more sense to maintain this as a branch, until we can mainline the source code changes so that they work for AMD as well as CUDA? (I was not privy to any of the original discussions, so I apologize if this is something you've already looked at.) |
Use the actual path for the file instead of the current working directory, which depends on where the script is invoked.
…iles with the same name results in a linking error in the HCC compiler used for ROCm/AMD.
…g up relevant hip paths.
…es while building.
ed8ef6a to
c3372ee
Compare
Replacing "WITH_CUDA" with "NOT NO_CUDA" after the rebase.
apaszke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One minor nit. Should be ready to merge once @ezyang approves the CMake part.
setup.py
Outdated
| DL = Extension("torch._dl", | ||
| sources=["torch/csrc/dl.c"], | ||
| language='c', | ||
| extra_link_args=[] |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@pytorchbot retest this please |
|
The build for pr/pytorch-linux-xenial-py3-clang5-asan was hanging for 3 hours previously so I restarted the build on Jenkins. Finished: success https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-xenial-py3-clang5-asan-build/4652/console |
* PyTorch AMD Build Script. * Python invocation for hipify * Adding individual hip fles. * Updating CWD Use the actual path for the file instead of the current working directory, which depends on where the script is invoked. * Updating folder path for amd_build * Removing previous amd_build directory * Updated setup.py to support WITH_ROCM * Renaming the files for CuDNN BatchNorm & Conv since having two .cpp files with the same name results in a linking error in the HCC compiler used for ROCm/AMD. * Removing old BatchNorm & Conv files since they've been renamed. * Updating build path to handle ROCM * Cleaned up the build path and created a FindHIP cmake file for setting up relevant hip paths. * Seperated the individual patch files to make it easier to detect issues while building. * Removed CMakeLists hip files and fixed directory structure * Adding build pytorch amd script * Merged setup patch into PyTorch setup.py & cleaned a few issues * Added information on where to download the hipify-python script. * Resolved linting issues inside of build_pytorch_amd.py * Removing many unnecessary patch files. Removing unnecessary .hip files. Fixing up the build process. * Refactored the PR for supporting HIP * Minimizing the number of changes inside individual patches. * Cleaned up patch files. * Removed patch files. * Updating patches * Removing HIP change from file. * Cleaned up patches * Added AVX/SSE avoidance due to bug with ROCms stack. Just temporary for now. * Removing the other HIP file * Removed patch file + merged ROCm into Aten/test * Removed ATen tests patch file and updated disbale_features yaml to remove headers that don't exist on the HIP stack. * Reduced the number of patches down to 14 after Edward's suggestions. * Transferred deletion of certain functions from patch to yaml file. * Set default Thrust path * Fixed aten files so we now use the templated pow/abs instead of std:: directly. * Removed error from aten/src/THCUNN/Abs.cu * Updated the locations of the cmake build files. Moved THCTensorRandom from a hip to a patch file. Added executable/library commands that can successfully handle either CUDA or HIP. * Removed hip extraction from the build script and removed the old hip file. * Replaced MACRO with function in upper level cmake. * Added empty ELSE() block to prevent the loading of a command without CUDA or HIP. Also added IF guards around torch_cuda_based_add_executable in Aten tests. * Updated aten tests. * Removed the hip include from the ATen header. * Can't throw exceptions on C++ AMP, using abort * Missing IF guards for cuda/hip executables in aten tests. * Removed a series of patch files. * Added template keyword to help out the HCC compiler. * Rebased the specific files displayed in the PR * Fixing typo. * Change flag from "WITH_CUDA" to "NOT NO_CUDA" Replacing "WITH_CUDA" with "NOT NO_CUDA" after the rebase. * Fix LoadHIP path * Updating build files after rebasing. * Reorganization after cpu/gpu separation. * Removed HIPCC from setup.py & removed -shared extra linking args. * Updated CMake / Setup build to correctly link when under ROCm stack. * Removed the unnecessary argument from Extension constructor. * Adding another test to be included with ROCm building. * Updated the setup_helpers scripts in order to get around linter error * Fix syntax issue * Solving lint issue: line too long
The PyTorch script files necessary for building on AMD GPUs.