[ROCm][inductor] Codegen support for fast_tanhf#162052
[ROCm][inductor] Codegen support for fast_tanhf#162052jataylo wants to merge 12 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162052
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (7 Unrelated Failures)As of commit 054fe6c with merge base 8b4f89e ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@jataylo does it make sense to conditionalize this if statement to work with either 2.9 or previous? |
Actually maybe we should even conditionalise this based on triton version, so we have bc support for older triton version. |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Rebase failed due to Command Raised by https://github.com/pytorch/pytorch/actions/runs/17941770194 |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Rebase failed due to Command Raised by https://github.com/pytorch/pytorch/actions/runs/17948442905 |
torch/_inductor/codegen/triton.py
Outdated
| def tanh(x): | ||
| return f"libdevice.tanh({x})" | ||
| if config.use_fast_math and torch.version.hip: | ||
| if get_triton_version() > (3, 4): |
There was a problem hiding this comment.
Just to double-check, do we want this to work for the case when you are on triton-lang main branch and on a commit that is after Triton 3,4 cut?
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
fdf2357 to
85d331e
Compare
torch/_inductor/codegen/triton.py
Outdated
| from torch.utils._ordered_set import OrderedSet | ||
| from torch.utils._sympy.functions import CeilDiv, FloorDiv, ModularIndexing | ||
| from torch.utils._triton import has_triton_package, has_triton_stable_tma_api | ||
| from torch.utils._sympy.value_ranges import bound_sympy |
There was a problem hiding this comment.
Is this import bound_sympy something that got committed by mistake?
|
Manually tested that with Triton 3.6 and |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / inductor-build-cuda13 / build Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 Co-authored-by: Nichols A. Romero <nick.romero@amd.com> (cherry picked from commit 9b885b0)
Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: #162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
…ytorch#162052) (#2860) Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 (cherry picked from commit 9b885b0) Other notes: - Similar to upstream PyTorch except we backported fast_tanhf back to Triton 3.3. - Resolves [SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928) - Resolves inductor failures here [SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937) Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>
…ytorch#162052) (#2860) Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 (cherry picked from commit 9b885b0) Other notes: - Similar to upstream PyTorch except we backported fast_tanhf back to Triton 3.3. - Resolves [SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928) - Resolves inductor failures here [SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937) Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com> (cherry picked from commit c82113e)
…ytorch#162052) (#2860) Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 (cherry picked from commit 9b885b0) Other notes: - Similar to upstream PyTorch except we backported fast_tanhf back to Triton 3.3. - Resolves [SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928) - Resolves inductor failures here [SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937) Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com> (cherry picked from commit c82113e)
…ytorch#162052) (#2860) Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 (cherry picked from commit 9b885b0) Other notes: - Similar to upstream PyTorch except we backported fast_tanhf back to Triton 3.3. - Resolves [SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928) - Resolves inductor failures here [SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937) Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com> (cherry picked from commit c82113e)
…ytorch#162052) (#2864) Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 (cherry picked from commit 9b885b0) Other notes: - Similar to upstream PyTorch except we backported fast_tanhf back to Triton 3.3. - Resolves [SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928) - Resolves inductor failures here [SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937) (cherry picked from commit c82113e) Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>
…ytorch#162052) (#2866) Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 (cherry picked from commit 9b885b0) Other notes: - Similar to upstream PyTorch except we backported fast_tanhf back to Triton 3.3. - Resolves [SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928) - Resolves inductor failures here [SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937) (cherry picked from commit c82113e) Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>
Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation.
Requires commits that will be present in Triton 3.6.
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos @dllehr-amd