[ROCm][inductor] Codegen support for fast_tanhf by jataylo · Pull Request #162052 · pytorch/pytorch

jataylo · 2025-09-03T09:41:40Z

Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation.

Requires commits that will be present in Triton 3.6.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos @dllehr-amd

pytorch-bot · 2025-09-03T09:46:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162052

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (7 Unrelated Failures)

As of commit 054fe6c with merge base 8b4f89e ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-jammy-rocm-py3.10 / test (default, 1, 6, linux.rocm.gpu.gfx942.1) (gh) (similar failure)
test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-jammy-py3.14-clang12 / test (default, 1, 5, linux.4xlarge) (gh) (trunk failure)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
pull / linux-jammy-py3.14-clang12 / test (default, 2, 5, linux.4xlarge) (gh) (trunk failure)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
pull / linux-jammy-py3.14-clang12 / test (default, 3, 5, linux.4xlarge) (gh) (trunk failure)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
pull / linux-jammy-py3.14-clang12 / test (default, 4, 5, linux.4xlarge) (gh) (trunk failure)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
pull / linux-jammy-py3.14-clang12 / test (default, 5, 5, linux.4xlarge) (gh) (trunk failure)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 4, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (trunk failure)
'Test'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jeffdaily · 2025-09-03T20:14:23Z

Requires triton 2.9 bump first

@jataylo does it make sense to conditionalize this if statement to work with either 2.9 or previous?

jataylo · 2025-09-04T09:17:04Z

Requires triton 2.9 bump first

@jataylo does it make sense to conditionalize this if statement to work with either 2.9 or previous?

Actually maybe we should even conditionalise this based on triton version, so we have bc support for older triton version.

jataylo · 2025-09-23T09:34:48Z

@pytorchbot rebase

pytorchmergebot · 2025-09-23T09:36:19Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-09-23T09:36:21Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/162052/head returned non-zero exit code 1

Rebasing (1/11)
Rebasing (2/11)
Auto-merging torch/_inductor/codegen/triton.py
CONFLICT (content): Merge conflict in torch/_inductor/codegen/triton.py
error: could not apply 6d7e2ae0e8b... Update triton.py
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 6d7e2ae0e8b... # Update triton.py

Raised by https://github.com/pytorch/pytorch/actions/runs/17941770194

jataylo · 2025-09-23T13:53:34Z

@pytorchbot rebase

pytorchmergebot · 2025-09-23T13:55:00Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-09-23T13:55:01Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/162052/head returned non-zero exit code 1

Rebasing (1/6)
Rebasing (2/6)
Auto-merging torch/_inductor/codegen/triton.py
CONFLICT (content): Merge conflict in torch/_inductor/codegen/triton.py
error: could not apply 6d7e2ae0e8b... Update triton.py
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 6d7e2ae0e8b... # Update triton.py

Raised by https://github.com/pytorch/pytorch/actions/runs/17948442905

naromero77amd · 2025-09-25T18:27:11Z

torch/_inductor/codegen/triton.py

    def tanh(x):
-        return f"libdevice.tanh({x})"
+        if config.use_fast_math and torch.version.hip:
+            if get_triton_version() > (3, 4):


Just to double-check, do we want this to work for the case when you are on triton-lang main branch and on a commit that is after Triton 3,4 cut?

naromero77amd · 2025-11-13T23:23:50Z

@pytorchbot rebase

pytorchmergebot · 2025-11-13T23:25:19Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-11-13T23:25:23Z

Successfully rebased jack-fast-tanh onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout jack-fast-tanh && git pull --rebase)

naromero77amd · 2025-11-13T23:30:00Z

torch/_inductor/codegen/triton.py

 from torch.utils._ordered_set import OrderedSet
 from torch.utils._sympy.functions import CeilDiv, FloorDiv, ModularIndexing
-from torch.utils._triton import has_triton_package, has_triton_stable_tma_api
+from torch.utils._sympy.value_ranges import bound_sympy


Is this import bound_sympy something that got committed by mistake?

naromero77amd · 2025-12-05T17:00:42Z

Manually tested that with Triton 3.6 and TORCHINDUCTOR_USE_FAST_MATH=1 we get libdevice.fast_tanhf and that with Triton 3.5 and TORCHINDUCTOR_USE_FAST_MATH=1 we are getting libdevice.tanh

naromero77amd · 2025-12-05T20:06:48Z

@pytorchbot merge

pytorchmergebot · 2025-12-05T20:14:46Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-12-05T20:15:03Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / inductor-build-cuda13 / build

Details for Dev Infra team

Raised by workflow job

naromero77amd · 2025-12-05T20:30:37Z

@pytorchbot merge

pytorchmergebot · 2025-12-05T20:33:04Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 Co-authored-by: Nichols A. Romero <nick.romero@amd.com>

Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 Co-authored-by: Nichols A. Romero <nick.romero@amd.com> (cherry picked from commit 9b885b0)

Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: #162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 Co-authored-by: Nichols A. Romero <nick.romero@amd.com>

…ytorch#162052) (#2860) Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 (cherry picked from commit 9b885b0) Other notes: - Similar to upstream PyTorch except we backported fast_tanhf back to Triton 3.3. - Resolves [SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928) - Resolves inductor failures here [SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937) Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>

…ytorch#162052) (#2860) Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 (cherry picked from commit 9b885b0) Other notes: - Similar to upstream PyTorch except we backported fast_tanhf back to Triton 3.3. - Resolves [SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928) - Resolves inductor failures here [SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937) Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com> (cherry picked from commit c82113e)

…ytorch#162052) (#2864) Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 (cherry picked from commit 9b885b0) Other notes: - Similar to upstream PyTorch except we backported fast_tanhf back to Triton 3.3. - Resolves [SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928) - Resolves inductor failures here [SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937) (cherry picked from commit c82113e) Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>

…ytorch#162052) (#2866) Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation. Requires commits that will be present in Triton 3.6. Pull Request resolved: pytorch#162052 Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314 (cherry picked from commit 9b885b0) Other notes: - Similar to upstream PyTorch except we backported fast_tanhf back to Triton 3.3. - Resolves [SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928) - Resolves inductor failures here [SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937) (cherry picked from commit c82113e) Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>

jataylo requested review from iupaikov-amd and naromero77amd September 3, 2025 09:41

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 3, 2025

jataylo changed the title ~~Add inductor codegen support for fast tanh path~~ [ROCm] Add inductor codegen support for fast tanh path Sep 3, 2025

pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch labels Sep 3, 2025

pytorchbot added the open source label Sep 3, 2025

jeffdaily approved these changes Sep 3, 2025

View reviewed changes

naromero77amd reviewed Sep 25, 2025

View reviewed changes

naromero77amd approved these changes Sep 25, 2025

View reviewed changes

naromero77amd added the release notes: inductor label Sep 26, 2025

naromero77amd changed the title ~~[ROCm] Add inductor codegen support for fast tanh path~~ [ROCm][inductor] Codegen support for fast_tanhf Nov 13, 2025

pytorchmergebot force-pushed the jack-fast-tanh branch from fdf2357 to 85d331e Compare November 13, 2025 23:25

naromero77amd reviewed Nov 13, 2025

View reviewed changes

jataylo requested a review from shunting314 December 5, 2025 10:51

shunting314 approved these changes Dec 5, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 5, 2025

pytorchmergebot added the merging label Dec 5, 2025

pytorchmergebot removed the merging label Dec 5, 2025

naromero77amd added ciflow/trunk Trigger trunk jobs on your pull request and removed ciflow/trunk Trigger trunk jobs on your pull request labels Dec 5, 2025

pytorchmergebot added the merging label Dec 5, 2025

pytorchmergebot added the Merged label Dec 6, 2025

pytorchmergebot closed this in 9b885b0 Dec 6, 2025

pytorchmergebot removed the merging label Dec 6, 2025

naromero77amd mentioned this pull request Dec 8, 2025

[UP CP][release/2.9][ROCm][inductor] Codegen support for fast_tanhf (#162052) ROCm/pytorch#2860

Merged

naromero77amd mentioned this pull request Dec 9, 2025

[UP CP][release/2.8][ROCm][inductor] Codegen support for fast_tanhf (#162052) ROCm/pytorch#2864

Merged

naromero77amd mentioned this pull request Dec 9, 2025

[UP CP][release/2.7][ROCm][inductor] Codegen support for fast_tanhf (#162052) ROCm/pytorch#2866

Merged

Conversation

jataylo commented Sep 3, 2025 • edited by naromero77amd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162052

✅ You can merge normally! (7 Unrelated Failures)

Uh oh!

jeffdaily commented Sep 3, 2025

Uh oh!

jataylo commented Sep 4, 2025

Uh oh!

jataylo commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Uh oh!

jataylo commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Uh oh!

naromero77amd Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

naromero77amd commented Nov 13, 2025

Uh oh!

pytorchmergebot commented Nov 13, 2025

Uh oh!

pytorchmergebot commented Nov 13, 2025

Uh oh!

naromero77amd Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

naromero77amd commented Dec 5, 2025

Uh oh!

naromero77amd commented Dec 5, 2025

Uh oh!

pytorchmergebot commented Dec 5, 2025

Merge started

Uh oh!

pytorchmergebot commented Dec 5, 2025

Merge failed

Uh oh!

naromero77amd commented Dec 5, 2025

Uh oh!

pytorchmergebot commented Dec 5, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

jataylo commented Sep 3, 2025 •

edited by naromero77amd

Loading

pytorch-bot bot commented Sep 3, 2025 •

edited

Loading