Skip to content

[ROCm][inductor] Codegen support for fast_tanhf#162052

Closed
jataylo wants to merge 12 commits intopytorch:mainfrom
jataylo:jack-fast-tanh
Closed

[ROCm][inductor] Codegen support for fast_tanhf#162052
jataylo wants to merge 12 commits intopytorch:mainfrom
jataylo:jack-fast-tanh

Conversation

@jataylo
Copy link
Collaborator

@jataylo jataylo commented Sep 3, 2025

Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation.

Requires commits that will be present in Triton 3.6.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos @dllehr-amd

@jataylo jataylo changed the title Add inductor codegen support for fast tanh path [ROCm] Add inductor codegen support for fast tanh path Sep 3, 2025
@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch labels Sep 3, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162052

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (7 Unrelated Failures)

As of commit 054fe6c with merge base 8b4f89e (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@jeffdaily
Copy link
Collaborator

Requires triton 2.9 bump first

@jataylo does it make sense to conditionalize this if statement to work with either 2.9 or previous?

@jataylo
Copy link
Collaborator Author

jataylo commented Sep 4, 2025

Requires triton 2.9 bump first

@jataylo does it make sense to conditionalize this if statement to work with either 2.9 or previous?

Actually maybe we should even conditionalise this based on triton version, so we have bc support for older triton version.

@jataylo
Copy link
Collaborator Author

jataylo commented Sep 23, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/162052/head returned non-zero exit code 1

Rebasing (1/11)
Rebasing (2/11)
Auto-merging torch/_inductor/codegen/triton.py
CONFLICT (content): Merge conflict in torch/_inductor/codegen/triton.py
error: could not apply 6d7e2ae0e8b... Update triton.py
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 6d7e2ae0e8b... # Update triton.py

Raised by https://github.com/pytorch/pytorch/actions/runs/17941770194

@jataylo
Copy link
Collaborator Author

jataylo commented Sep 23, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/162052/head returned non-zero exit code 1

Rebasing (1/6)
Rebasing (2/6)
Auto-merging torch/_inductor/codegen/triton.py
CONFLICT (content): Merge conflict in torch/_inductor/codegen/triton.py
error: could not apply 6d7e2ae0e8b... Update triton.py
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 6d7e2ae0e8b... # Update triton.py

Raised by https://github.com/pytorch/pytorch/actions/runs/17948442905

def tanh(x):
return f"libdevice.tanh({x})"
if config.use_fast_math and torch.version.hip:
if get_triton_version() > (3, 4):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double-check, do we want this to work for the case when you are on triton-lang main branch and on a commit that is after Triton 3,4 cut?

@naromero77amd
Copy link
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased jack-fast-tanh onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout jack-fast-tanh && git pull --rebase)

from torch.utils._ordered_set import OrderedSet
from torch.utils._sympy.functions import CeilDiv, FloorDiv, ModularIndexing
from torch.utils._triton import has_triton_package, has_triton_stable_tma_api
from torch.utils._sympy.value_ranges import bound_sympy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this import bound_sympy something that got committed by mistake?

@jataylo jataylo requested a review from shunting314 December 5, 2025 10:51
@naromero77amd
Copy link
Collaborator

Manually tested that with Triton 3.6 and TORCHINDUCTOR_USE_FAST_MATH=1 we get libdevice.fast_tanhf and that with Triton 3.5 and TORCHINDUCTOR_USE_FAST_MATH=1 we are getting libdevice.tanh

@naromero77amd
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 5, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / inductor-build-cuda13 / build

Details for Dev Infra team Raised by workflow job

@naromero77amd naromero77amd added ciflow/trunk Trigger trunk jobs on your pull request and removed ciflow/trunk Trigger trunk jobs on your pull request labels Dec 5, 2025
@naromero77amd
Copy link
Collaborator

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

umechand-amd pushed a commit to ROCm/pytorch that referenced this pull request Dec 8, 2025
Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation.

Requires commits that will be present in Triton 3.6.

Pull Request resolved: pytorch#162052
Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314

Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
naromero77amd added a commit to ROCm/pytorch that referenced this pull request Dec 8, 2025
Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation.

Requires commits that will be present in Triton 3.6.

Pull Request resolved: pytorch#162052
Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314

Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
(cherry picked from commit 9b885b0)
JacobSzwejbka pushed a commit that referenced this pull request Dec 8, 2025
Improve tanh performance on ROCm when the data type is float32 or lower precision. For float64, retain the current implementation.

Requires commits that will be present in Triton 3.6.

Pull Request resolved: #162052
Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd, https://github.com/eellison, https://github.com/mlazos, https://github.com/v0i0, https://github.com/shunting314

Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
jataylo added a commit to ROCm/pytorch that referenced this pull request Dec 9, 2025
…ytorch#162052) (#2860)

Improve tanh performance on ROCm when the data type is float32 or lower
precision. For float64, retain the current implementation.

Requires commits that will be present in Triton 3.6.

Pull Request resolved: pytorch#162052
Approved by: https://github.com/jeffdaily,
https://github.com/naromero77amd, https://github.com/eellison,
https://github.com/mlazos, https://github.com/v0i0,
https://github.com/shunting314


(cherry picked from commit 9b885b0)

Other notes:
- Similar to upstream PyTorch except we backported fast_tanhf back to
Triton 3.3.
- Resolves
[SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928)
- Resolves inductor failures here
[SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937)

Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>
naromero77amd added a commit to ROCm/pytorch that referenced this pull request Dec 9, 2025
…ytorch#162052) (#2860)

Improve tanh performance on ROCm when the data type is float32 or lower
precision. For float64, retain the current implementation.

Requires commits that will be present in Triton 3.6.

Pull Request resolved: pytorch#162052
Approved by: https://github.com/jeffdaily,
https://github.com/naromero77amd, https://github.com/eellison,
https://github.com/mlazos, https://github.com/v0i0,
https://github.com/shunting314

(cherry picked from commit 9b885b0)

Other notes:
- Similar to upstream PyTorch except we backported fast_tanhf back to
Triton 3.3.
- Resolves
[SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928)
- Resolves inductor failures here
[SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937)

Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>
(cherry picked from commit c82113e)
naromero77amd added a commit to ROCm/pytorch that referenced this pull request Dec 9, 2025
…ytorch#162052) (#2860)

Improve tanh performance on ROCm when the data type is float32 or lower
precision. For float64, retain the current implementation.

Requires commits that will be present in Triton 3.6.

Pull Request resolved: pytorch#162052
Approved by: https://github.com/jeffdaily,
https://github.com/naromero77amd, https://github.com/eellison,
https://github.com/mlazos, https://github.com/v0i0,
https://github.com/shunting314

(cherry picked from commit 9b885b0)

Other notes:
- Similar to upstream PyTorch except we backported fast_tanhf back to
Triton 3.3.
- Resolves
[SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928)
- Resolves inductor failures here
[SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937)

Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>
(cherry picked from commit c82113e)
naromero77amd added a commit to ROCm/pytorch that referenced this pull request Dec 9, 2025
…ytorch#162052) (#2860)

Improve tanh performance on ROCm when the data type is float32 or lower
precision. For float64, retain the current implementation.

Requires commits that will be present in Triton 3.6.

Pull Request resolved: pytorch#162052
Approved by: https://github.com/jeffdaily,
https://github.com/naromero77amd, https://github.com/eellison,
https://github.com/mlazos, https://github.com/v0i0,
https://github.com/shunting314

(cherry picked from commit 9b885b0)

Other notes:
- Similar to upstream PyTorch except we backported fast_tanhf back to
Triton 3.3.
- Resolves
[SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928)
- Resolves inductor failures here
[SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937)

Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>
(cherry picked from commit c82113e)
jataylo added a commit to ROCm/pytorch that referenced this pull request Dec 10, 2025
…ytorch#162052) (#2864)

Improve tanh performance on ROCm when the data type is float32 or lower
precision. For float64, retain the current implementation.

Requires commits that will be present in Triton 3.6.

Pull Request resolved: pytorch#162052
Approved by: https://github.com/jeffdaily,
https://github.com/naromero77amd, https://github.com/eellison,
https://github.com/mlazos, https://github.com/v0i0,
https://github.com/shunting314

(cherry picked from commit 9b885b0)

Other notes:
- Similar to upstream PyTorch except we backported fast_tanhf back to
Triton 3.3.
- Resolves
[SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928)
- Resolves inductor failures here
[SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937)


(cherry picked from commit c82113e)

Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>
jataylo added a commit to ROCm/pytorch that referenced this pull request Dec 10, 2025
…ytorch#162052) (#2866)

Improve tanh performance on ROCm when the data type is float32 or lower
precision. For float64, retain the current implementation.

Requires commits that will be present in Triton 3.6.

Pull Request resolved: pytorch#162052
Approved by: https://github.com/jeffdaily,
https://github.com/naromero77amd, https://github.com/eellison,
https://github.com/mlazos, https://github.com/v0i0,
https://github.com/shunting314

(cherry picked from commit 9b885b0)

Other notes:
- Similar to upstream PyTorch except we backported fast_tanhf back to
Triton 3.3.
- Resolves
[SWDEV-569928](https://ontrack-internal.amd.com/browse/SWDEV-569928)
- Resolves inductor failures here
[SWDEV-557937](https://ontrack-internal.amd.com/browse/SWDEV-557937)

(cherry picked from commit c82113e)

Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end Merged module: inductor module: rocm AMD GPU support for Pytorch open source release notes: inductor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants