[xpu][feature][1/N] Enable SDPA XPU FlashAttention backend with SYCL-TLA implementation #169101
[xpu][feature][1/N] Enable SDPA XPU FlashAttention backend with SYCL-TLA implementation #169101LuFinch wants to merge 4 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169101
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit b44f8be with merge base a5436a5 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
d1a7dbc to
bae7b6d
Compare
|
Hi @atalman , this PR aims to accelerate attention performance by using the Intel template kernel library. For the XPU part, it looks good to me. However, we have to modify the |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: xpu / linux-jammy-xpu-n-1-py3.10 / build Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx942.1) Details for Dev Infra teamRaised by workflow job |
|
This PR only adds XPU sycl-tla based flash attention. It does not touch any ROCM code. @pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx942.1) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…LA implementation (#167057) This is a PR to upstream [SYCL-TLA](https://github.com/intel/sycl-tla) version FlashAttention for Pytorch XPU. This is the second PR to register SYCL-TLA version FlashAttention forward/backward xpu kernels into SDPA's FlashAttention XPU backend. PR stacks: - #169101 - #167057 Currently, we support Intel Ponte Vecchio and Battlemage on Linux. In terms of other platform support, we are WIP. Pull Request resolved: #167057 Approved by: https://github.com/EikanWang, https://github.com/drisspg Co-authored-by: Eikan Wang <eikan.wang@intel.com>
…TLA implementation (#169101) This is a PR to utilize [SYCL-TLA](https://github.com/intel/sycl-tla)-based FlashAttention to accelerate `scaled_dot_product_attention` for Pytorch XPU. PR stacks: - #169101 - #167057 Pull Request resolved: #169101 Approved by: https://github.com/EikanWang, https://github.com/atalman
…LA implementation (#167057) This is a PR to upstream [SYCL-TLA](https://github.com/intel/sycl-tla) version FlashAttention for Pytorch XPU. This is the second PR to register SYCL-TLA version FlashAttention forward/backward xpu kernels into SDPA's FlashAttention XPU backend. PR stacks: - #169101 - #167057 Currently, we support Intel Ponte Vecchio and Battlemage on Linux. In terms of other platform support, we are WIP. Pull Request resolved: #167057 Approved by: https://github.com/EikanWang, https://github.com/drisspg Co-authored-by: Eikan Wang <eikan.wang@intel.com>
This is a PR to utilize SYCL-TLA-based FlashAttention to accelerate
scaled_dot_product_attentionfor Pytorch XPU.PR stacks: