[ROCm/Windows] Support aotriton for scaled_dot_product_attention on Windows.#162330
[ROCm/Windows] Support aotriton for scaled_dot_product_attention on Windows.#162330jammm wants to merge 4 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162330
Note: Links to docs will display an error until the docs builds have been completed. ❌ 19 New Failures, 2 Cancelled Jobs, 2 Unrelated FailuresAs of commit 4f46a52 with merge base 5b9114b ( NEW FAILURES - The following jobs have failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
cc @ScottTodd |
|
cc @xinyazhang |
|
@pytorchbot label "release notes: rocm |
|
❌ 🤖 pytorchbot command failed: |
|
@pytorchbot label "release notes: rocm" |
|
@pytorchbot label "topic: performance" |
|
@pytorchbot label "module: windows" |
a44f41f to
f7ebef2
Compare
|
Wait a minute, so this is actually TheRock's WoW |
f7ebef2 to
3c50ab2
Compare
|
Great to see Xinya has approved :D Who else do we need here as a reviewer with merge privileges? Jeff? |
| if(USE_ROCM) | ||
| if(UNIX AND (USE_FLASH_ATTENTION OR USE_MEM_EFF_ATTENTION)) | ||
| if(USE_FLASH_ATTENTION OR USE_MEM_EFF_ATTENTION) | ||
| include(cmake/External/aotriton.cmake) |
There was a problem hiding this comment.
Thanks, I tested this using https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py, with and without --enable-pytorch-flash-attention-windows.
-
Both builds succeeded
-
Running pytorch succeeded with aotriton enabled, and comfyUI seemed to generate images on my gfx1100 GPU using the memory efficient attention implementation (after setting the
TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1env var) -
With the option enabled, I see 45MB more logs (15Mb -> 60MB), including 5337 instances of this warning. It seems to just be a warning, possibly fixed by forcing Python into UTF8 mode (will verify)
Message: '%s %s -> %s' Arguments: ('copying', 'torch\\lib\\aotriton.images\\amd-gfx11xx\\flash\\bwd_kernel_dq\\FONLY__\uff0afp32@16_48_0_T_T_1___gfx11xx.aks2', 'build\\lib.win-amd64-cpython-312\\torch\\lib\\aotriton.images\\amd-gfx11xx\\flash\\bwd_kernel_dq') --- Logging error --- Traceback (most recent call last): File "C:\Users\Nod-Shark16\AppData\Local\Programs\Python\Python312\Lib\logging\__init__.py", line 1163, in emit stream.write(msg + self.terminator) File "C:\Users\Nod-Shark16\AppData\Local\Programs\Python\Python312\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\uff0a' in position 73: character maps to <undefined> Call stack: File "D:\b\pytorch_main\setup.py", line 1785, in <module> main() File "D:\b\pytorch_main\setup.py", line 1766, in main setup( File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\__init__.py", line 117, in setup return distutils.core.setup(**attrs) File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\_distutils\core.py", line 186, in setup return run_commands(dist) File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\_distutils\core.py", line 202, in run_commands dist.run_commands() File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\_distutils\dist.py", line 1002, in run_commands self.run_command(cmd) File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\dist.py", line 1104, in run_command super().run_command(command) File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\_distutils\dist.py", line 1021, in run_command cmd_obj.run() File "D:\b\pytorch_main\setup.py", line 1353, in run super().run() File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\command\bdist_wheel.py", line 370, in run self.run_command("build") File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\_distutils\cmd.py", line 357, in run_command self.distribution.run_command(command) File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\dist.py", line 1104, in run_command super().run_command(command) File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\_distutils\dist.py", line 1021, in run_command cmd_obj.run() File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\_distutils\command\build.py", line 135, in run self.run_command(cmd_name) File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\_distutils\cmd.py", line 357, in run_command self.distribution.run_command(command) File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\dist.py", line 1104, in run_command super().run_command(command) File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\_distutils\dist.py", line 1021, in run_command cmd_obj.run() File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\command\build_py.py", line 78, in run self.build_package_data() File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\command\build_py.py", line 171, in build_package_data _outf, _copied = self.copy_file(srcfile, target) File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\command\build_py.py", line 64, in copy_file return super().copy_file( # pyright: ignore[reportReturnType] # pypa/distutils#309 File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\_distutils\cmd.py", line 421, in copy_file return file_util.copy_file( File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\_distutils\file_util.py", line 130, in copy_file log.info("%s %s -> %s", action, src, dir)
There was a problem hiding this comment.
I rebuilt (without fully cleaning my build/source dirs) with the PYTHONUTF8=1 environment variable and didn't see the warnings. Hopefully a clean rebuild (including deleting torch/lib/aotriton.images/ in the source dir) is also warning-free. We can add that env var to our downstream build script and any upstream build scripts we contribute (see #160776)
|
@jeffdaily PTAL. Received approval from @xinyazhang and @ScottTodd. |
## Motivation Progress on #1040, getting closer to enabling aotriton in PyTorch on Windows. ## Technical Details This will supersede #1409 and is dependent on pytorch/pytorch#162330. The UTF8 change I believe helps with warnings about logs for copying files with unicode characters in their names: ``` Message: '%s %s -> %s' Arguments: ('copying', 'torch\\lib\\aotriton.images\\amd-gfx11xx\\flash\\bwd_kernel_dq\\FONLY__\uff0afp32@16_48_0_T_T_1___gfx11xx.aks2', 'build\\lib.win-amd64-cpython-312\\torch\\lib\\aotriton.images\\amd-gfx11xx\\flash\\bwd_kernel_dq') --- Logging error --- Traceback (most recent call last): File "C:\Users\Nod-Shark16\AppData\Local\Programs\Python\Python312\Lib\logging\__init__.py", line 1163, in emit stream.write(msg + self.terminator) File "C:\Users\Nod-Shark16\AppData\Local\Programs\Python\Python312\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\uff0a' in position 73: character maps to <undefined> Call stack: File "D:\b\pytorch_main\setup.py", line 1785, in <module> main() File "D:\b\pytorch_main\setup.py", line 1766, in main setup( File "D:\projects\TheRock\external-builds\pytorch\3.12.venv\Lib\site-packages\setuptools\__init__.py", line 117, in setup return distutils.core.setup(**attrs) ``` ## Test Plan Tested with local builds on Windows with and without `--enable-pytorch-flash-attention-windows`. ## Test Result Builds succeeded, ComfyUI generated images on my gfx1100 GPU (needed `TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1` for aotriton on that GPU). ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
|
Lint test fails because: But |
No that's the fix to the regression that broke the CUDA builds. The merge failures are unrelated and should be fixed once they're fixed elsewhere |
Kinda curious where and when they should be fixed 🤔 Oh, #162881 (comment) 👀 |
|
@pytorchbot merge -f "the cuda build OOM that caused a revert of this PR has been fixed, all other failures are unrelated" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/xinyazhang, https://github.com/ScottTodd, https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…ion on Windows. (pytorch#162330)" This reverts commit 62843c1. Reverted pytorch#162330 on behalf of https://github.com/atalman due to Sorry reverting looks like broke windows nightlies see pytorch#162881 ([comment](pytorch#162330 (comment)))
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/xinyazhang, https://github.com/ScottTodd, https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…ion on Windows. (pytorch#162330)" This reverts commit 62843c1. Reverted pytorch#162330 on behalf of https://github.com/atalman due to Sorry reverting looks like broke windows nightlies see pytorch#162881 ([comment](pytorch#162330 (comment)))
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/xinyazhang, https://github.com/ScottTodd, https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…ion on Windows. (pytorch#162330)" This reverts commit 62843c1. Reverted pytorch#162330 on behalf of https://github.com/atalman due to Sorry reverting looks like broke windows nightlies see pytorch#162881 ([comment](pytorch#162330 (comment)))
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/xinyazhang, https://github.com/ScottTodd, https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…ion on Windows. (pytorch#162330)" This reverts commit 62843c1. Reverted pytorch#162330 on behalf of https://github.com/atalman due to Sorry reverting looks like broke windows nightlies see pytorch#162881 ([comment](pytorch#162330 (comment)))
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
Fixes: pytorch#163958 Cherry-pick pytorch#161754 Cherry-pick pytorch#162330 Cherry-pick pytorch#163373 Cherry-pick pytorch#163745 Note TF32 support is still being plagued by `HIPBLASLT_ALLOW_TF32`, which should be handled by another PR due to its complexity. --------- Co-authored-by: Aaryaman Vasishta <aaryaman.vasishta@amd.com> Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
…indows. (pytorch#162330) Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton. Already tested to be working on Windows with TheRock. Steps to enable: simply set `USE_FLASH_ATTENTION=1` and `USE_MEM_EFF_ATTENTION=1` as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604 Pull Request resolved: pytorch#162330 Approved by: https://github.com/jeffdaily Co-authored-by: Scott Todd <scott.todd0@gmail.com>
Enables flash attention and/or memory efficient attention on Windows with scaled_dot_product_attention via. aotriton.
Already tested to be working on Windows with TheRock.
Steps to enable: simply set
USE_FLASH_ATTENTION=1andUSE_MEM_EFF_ATTENTION=1as usual. See https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py#L578-L604cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd