Skip to content

Conversation

@micmelesse
Copy link
Contributor

This PR is a follow up to the following prs.
#69942
#72682
#72809
#73543

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.

@pytorch-bot pytorch-bot bot added module: rocm AMD GPU support for Pytorch ciflow/default labels Feb 28, 2022
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 28, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/micmelesse/pytorch/blob/9c83c142fbffac700fb5a25cd54fc3a1c2094f2e/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla 🚫 skipped

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Feb 28, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 9c83c14 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@micmelesse
Copy link
Contributor Author

@ngimel @malfet Here is the 5th Navi PR. Everything passes.

@dagitses dagitses requested a review from ngimel March 3, 2022 12:48
@dagitses dagitses added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 3, 2022
@facebook-github-bot
Copy link
Contributor

@ngimel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@micmelesse
Copy link
Contributor Author

@ngimel Did something break here? The commit has not been merged by the bot.

facebook-github-bot pushed a commit that referenced this pull request Mar 8, 2022
Summary:
This PR is a follow up to the following prs.
#69942
#72682
#72809
#73543

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.

Pull Request resolved: #73545

Reviewed By: jbschlosser

Differential Revision: D34616171

Pulled By: ngimel

fbshipit-source-id: d9b3a17de2457e33ddc5d9c817799a1c85826ccb
@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2022

Hey @micmelesse.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 9, 2022
Summary:
This PR is a follow up to the following prs.
pytorch/pytorch#69942
pytorch/pytorch#72682
pytorch/pytorch#72809
pytorch/pytorch#73543

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.

Pull Request resolved: pytorch/pytorch#73545

Reviewed By: jbschlosser

Differential Revision: D34616171

Pulled By: ngimel

fbshipit-source-id: d9b3a17de2457e33ddc5d9c817799a1c85826ccb
(cherry picked from commit f54b12c642e4b33cf9bf27f93f628e147dc37ddc)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 9, 2022
Summary:
This PR is a follow up to the following prs.
pytorch/pytorch#69942
pytorch/pytorch#72682
pytorch/pytorch#72809
pytorch/pytorch#73543

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.

Pull Request resolved: pytorch/pytorch#73545

Reviewed By: jbschlosser

Differential Revision: D34616171

Pulled By: ngimel

fbshipit-source-id: d9b3a17de2457e33ddc5d9c817799a1c85826ccb
(cherry picked from commit f54b12c642e4b33cf9bf27f93f628e147dc37ddc)
pytorchmergebot pushed a commit that referenced this pull request Mar 21, 2022
This PR is a follow up to the following prs.
#69942
#72682
#72809
#73543
#73545

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: #73546
Approved by: https://github.com/osalpekar
facebook-github-bot pushed a commit that referenced this pull request Mar 22, 2022
Summary:
This PR is a follow up to the following prs.
#69942
#72682
#72809
#73543
#73545

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: #73546
Approved by: https://github.com/osalpekar

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/14a891f38eb205169062e126ba81b5c9ececfc44

Reviewed By: malfet

Differential Revision: D35026052

fbshipit-source-id: 64fb14d39199ccf2dafdb7e63b5fe78da315abf5
pytorchmergebot pushed a commit that referenced this pull request Mar 25, 2022
This PR is a follow up to the following prs.
#69942
#72682
#72809
#73543
#73545
#73546

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: #73548
Approved by: https://github.com/ngimel
shahofblah pushed a commit that referenced this pull request Mar 25, 2022
This PR is a follow up to the following prs.
#69942
#72682
#72809
#73543
#73545

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: #73546
Approved by: https://github.com/osalpekar
pytorchmergebot pushed a commit that referenced this pull request Mar 26, 2022
This PR is a follow up to the following prs.
#69942
#72682
#72809
#73543
#73545
#73546
#73548

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: #73549
Approved by: https://github.com/malfet
facebook-github-bot pushed a commit that referenced this pull request Mar 29, 2022
Summary:
This PR is a follow up to the following prs.
#69942
#72682
#72809
#73543
#73545
#73546

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: #73548
Approved by: https://github.com/ngimel

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/cd929f403f1a5d0a4feb9ec5a6bc6fe918d39a6e

Reviewed By: malfet

Differential Revision: D35188054

fbshipit-source-id: 630b45ba6b4d5b1386fcc0f8c979f41924fe9651
facebook-github-bot pushed a commit that referenced this pull request Mar 30, 2022
Summary:
This PR is a follow up to the following prs.
#69942
#72682
#72809
#73543
#73545
#73546
#73548

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: #73549
Approved by: https://github.com/malfet

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/56e0537e4e4fa209f70f0a08e82856c92c465162

Reviewed By: malfet

Differential Revision: D35188063

fbshipit-source-id: b625fcff4acfa892a638b3cedde6c2818e68cd47
facebook-github-bot pushed a commit that referenced this pull request Apr 7, 2022
Summary:
This PR is a follow up to the following prs.
#69942
#72682
#72809
#73543
#73545
#73546
#73548
#73549

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.

Pull Request resolved: #73550

Reviewed By: malfet

Differential Revision: D35444958

Pulled By: ngimel

fbshipit-source-id: c65f06d3227c23bb097a71fc6c86e3f884114e04
pytorchmergebot pushed a commit that referenced this pull request Apr 7, 2022
Summary:
This PR is a follow up to the following prs.
#69942
#72682
#72809
#73543
#73545
#73546
#73548
#73549

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.

Pull Request resolved: #73550

Reviewed By: malfet

Differential Revision: D35444958

Pulled By: ngimel

fbshipit-source-id: c65f06d3227c23bb097a71fc6c86e3f884114e04
(cherry picked from commit 7f3ba52)
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Jul 11, 2022
Summary:
This PR is a follow up to the following prs.
pytorch#69942
pytorch#72682
pytorch#72809
pytorch#73543

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.

Pull Request resolved: pytorch#73545

Reviewed By: jbschlosser

Differential Revision: D34616171

Pulled By: ngimel

fbshipit-source-id: d9b3a17de2457e33ddc5d9c817799a1c85826ccb
(cherry picked from commit f54b12c)
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Jul 11, 2022
This PR is a follow up to the following prs.
pytorch#69942
pytorch#72682
pytorch#72809
pytorch#73543
pytorch#73545

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: pytorch#73546
Approved by: https://github.com/osalpekar
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Jul 11, 2022
This PR is a follow up to the following prs.
pytorch#69942
pytorch#72682
pytorch#72809
pytorch#73543
pytorch#73545
pytorch#73546

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: pytorch#73548
Approved by: https://github.com/ngimel
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Jul 11, 2022
This PR is a follow up to the following prs.
pytorch#69942
pytorch#72682
pytorch#72809
pytorch#73543
pytorch#73545
pytorch#73546
pytorch#73548

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: pytorch#73549
Approved by: https://github.com/malfet
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Jul 11, 2022
)

Summary:
This PR is a follow up to the following prs.
pytorch#69942
pytorch#72682
pytorch#72809
pytorch#73543
pytorch#73545
pytorch#73546
pytorch#73548
pytorch#73549

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.

Pull Request resolved: pytorch#73550

Reviewed By: malfet

Differential Revision: D35444958

Pulled By: ngimel

fbshipit-source-id: c65f06d3227c23bb097a71fc6c86e3f884114e04
(cherry picked from commit 7f3ba52)
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Jul 12, 2022
Summary:
This PR is a follow up to the following prs.
pytorch#69942
pytorch#72682
pytorch#72809
pytorch#73543

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.

Pull Request resolved: pytorch#73545

Reviewed By: jbschlosser

Differential Revision: D34616171

Pulled By: ngimel

fbshipit-source-id: d9b3a17de2457e33ddc5d9c817799a1c85826ccb
(cherry picked from commit f54b12c)
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Jul 12, 2022
This PR is a follow up to the following prs.
pytorch#69942
pytorch#72682
pytorch#72809
pytorch#73543
pytorch#73545

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: pytorch#73546
Approved by: https://github.com/osalpekar
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Jul 12, 2022
This PR is a follow up to the following prs.
pytorch#69942
pytorch#72682
pytorch#72809
pytorch#73543
pytorch#73545
pytorch#73546

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: pytorch#73548
Approved by: https://github.com/ngimel
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Jul 12, 2022
This PR is a follow up to the following prs.
pytorch#69942
pytorch#72682
pytorch#72809
pytorch#73543
pytorch#73545
pytorch#73546
pytorch#73548

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.
Pull Request resolved: pytorch#73549
Approved by: https://github.com/malfet
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Jul 12, 2022
)

Summary:
This PR is a follow up to the following prs.
pytorch#69942
pytorch#72682
pytorch#72809
pytorch#73543
pytorch#73545
pytorch#73546
pytorch#73548
pytorch#73549

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.

Pull Request resolved: pytorch#73550

Reviewed By: malfet

Differential Revision: D35444958

Pulled By: ngimel

fbshipit-source-id: c65f06d3227c23bb097a71fc6c86e3f884114e04
(cherry picked from commit 7f3ba52)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed module: rocm AMD GPU support for Pytorch open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants