[xpu][feature][1/N] Enable SDPA XPU FlashAttention backend with SYCL-TLA implementation #169101

Closed

LuFinch wants to merge 4 commits intopytorch:mainfrom

LuFinch:lfq/upstream_fa2_PR

Contributor

LuFinch commented Nov 26, 2025 •

edited by EikanWang

Loading

This is a PR to utilize SYCL-TLA-based FlashAttention to accelerate scaled_dot_product_attention for Pytorch XPU.

PR stacks:

[xpu][feature][1/N] Enable SDPA XPU FlashAttention backend with SYCL-TLA implementation #169101
[xpu][feature][2/N]Enable SDPA XPU FlashAttention backend with SYCL-TLA implementation #167057


          add files for flash attention xpu

c23d404

pytorch-bot bot commented Nov 26, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169101

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit b44f8be with merge base a5436a5 ():

NEW FAILURE - The following job has failed:

trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx942.1) (gh)
test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16

This comment was automatically generated by Dr. CI and updates every 15 minutes.

LuFinch mentioned this pull request

[xpu][feature][2/N]Enable SDPA XPU FlashAttention backend with SYCL-TLA implementation #167057

Closed

pytorchbot added the open source label


          more UTs align

bae7b6d

LuFinch force-pushed the lfq/upstream_fa2_PR branch from d1a7dbc to bae7b6d Compare

November 26, 2025 06:44


          enable build

8a7dbd3

LuFinch requested review from EikanWang and gujinghui as code owners

November 26, 2025 07:14

EikanWang approved these changes

View reviewed changes

EikanWang requested a review from atalman

November 26, 2025 07:33

Collaborator

EikanWang commented Nov 26, 2025

Hi @atalman , this PR aims to accelerate attention performance by using the Intel template kernel library. For the XPU part, it looks good to me. However, we have to modify the CMakeList.txt to compile the newly added source files. I'm wondering if you could help review the PR as well. Thanks in advance!

EikanWang added the ciflow/xpu label

pytorch-bot bot commented Nov 28, 2025

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

pytorch-bot bot removed the ciflow/xpu label

EikanWang added topic: not user facing ciflow/xpu labels

pytorch-bot bot commented Nov 28, 2025

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

pytorch-bot bot removed the ciflow/xpu label

EikanWang added ciflow/trunk ciflow/xpu labels

pytorch-bot bot commented Nov 28, 2025

To add the ciflow label ciflow/trunk please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

pytorch-bot bot commented Nov 28, 2025

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

pytorch-bot bot removed ciflow/trunk ciflow/xpu labels

EikanWang added ciflow/xpu ciflow/trunk labels

pytorch-bot bot commented Nov 28, 2025

To add the ciflow label ciflow/trunk please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

pytorch-bot bot removed the ciflow/trunk label

atalman approved these changes

View reviewed changes

Contributor

atalman left a comment

lgtm

Collaborator

EikanWang commented Dec 1, 2025

@pytorchbot merge

pytorch-bot bot added the ciflow/trunk label

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Dec 1, 2025

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Collaborator

pytorchmergebot commented Dec 1, 2025

Merge failed

Reason: 1 jobs have failed, first few of them are: xpu / linux-jammy-xpu-n-1-py3.10 / build

Details for Dev Infra team

Raised by workflow job

pytorchmergebot removed the merging label


          remove g31 and update commit pin

b44f8be

pytorch-bot bot removed ciflow/trunk ciflow/xpu labels

LuFinch requested a review from EikanWang

December 1, 2025 14:55

EikanWang approved these changes

View reviewed changes

Collaborator

EikanWang commented Dec 1, 2025

@pytorchbot merge

pytorch-bot bot added the ciflow/trunk label

EikanWang added the ciflow/xpu label

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Dec 1, 2025

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Collaborator

pytorchmergebot commented Dec 1, 2025

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx942.1)

Details for Dev Infra team

Raised by workflow job

pytorchmergebot removed the merging label

Collaborator

EikanWang commented Dec 2, 2025

This PR only adds XPU sycl-tla based flash attention. It does not touch any ROCM code.

@pytorchbot merge -i

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Dec 2, 2025

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx942.1)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot added the Merged label

pytorchmergebot closed this in

b8c4ba3

pytorchmergebot removed the merging label

pytorchmergebot pushed a commit that referenced this pull request


          [xpu][feature][2/N]Enable SDPA XPU FlashAttention backend with SYCL-T…

bc6a486

…LA implementation (#167057)

This is a PR to upstream [SYCL-TLA](https://github.com/intel/sycl-tla) version FlashAttention for Pytorch XPU.

This is the second PR to register SYCL-TLA version FlashAttention forward/backward xpu kernels into SDPA's FlashAttention XPU backend.

PR stacks:
- #169101
- #167057

Currently, we support Intel Ponte Vecchio and Battlemage on Linux. In terms of other platform support, we are WIP.

Pull Request resolved: #167057
Approved by: https://github.com/EikanWang, https://github.com/drisspg

Co-authored-by: Eikan Wang <eikan.wang@intel.com>

JacobSzwejbka pushed a commit that referenced this pull request


          [xpu][feature][1/N] Enable SDPA XPU FlashAttention backend with SYCL-…

4b5513c

…TLA implementation (#169101)

This is a PR to utilize [SYCL-TLA](https://github.com/intel/sycl-tla)-based FlashAttention to accelerate `scaled_dot_product_attention` for Pytorch XPU.

PR stacks:
- #169101
- #167057
Pull Request resolved: #169101
Approved by: https://github.com/EikanWang, https://github.com/atalman

JacobSzwejbka pushed a commit that referenced this pull request


          [xpu][feature][2/N]Enable SDPA XPU FlashAttention backend with SYCL-T…

9cd12ca

…LA implementation (#167057)

This is a PR to upstream [SYCL-TLA](https://github.com/intel/sycl-tla) version FlashAttention for Pytorch XPU.

This is the second PR to register SYCL-TLA version FlashAttention forward/backward xpu kernels into SDPA's FlashAttention XPU backend.

PR stacks:
- #169101
- #167057

Currently, we support Intel Ponte Vecchio and Battlemage on Linux. In terms of other platform support, we are WIP.

Pull Request resolved: #167057
Approved by: https://github.com/EikanWang, https://github.com/drisspg

Co-authored-by: Eikan Wang <eikan.wang@intel.com>

LuFinch deleted the lfq/upstream_fa2_PR branch

January 28, 2026 05:31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk ciflow/xpu Merged open source topic: not user facing