Skip to content

[xpu][feature][1/N] Enable SDPA XPU FlashAttention backend with SYCL-TLA implementation #169101

Closed
LuFinch wants to merge 4 commits intopytorch:mainfrom
LuFinch:lfq/upstream_fa2_PR
Closed

[xpu][feature][1/N] Enable SDPA XPU FlashAttention backend with SYCL-TLA implementation #169101
LuFinch wants to merge 4 commits intopytorch:mainfrom
LuFinch:lfq/upstream_fa2_PR

Conversation

@LuFinch
Copy link
Contributor

@LuFinch LuFinch commented Nov 26, 2025

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/169101

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit b44f8be with merge base a5436a5 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@LuFinch LuFinch force-pushed the lfq/upstream_fa2_PR branch from d1a7dbc to bae7b6d Compare November 26, 2025 06:44
@EikanWang EikanWang requested a review from atalman November 26, 2025 07:33
@EikanWang
Copy link
Collaborator

Hi @atalman , this PR aims to accelerate attention performance by using the Intel template kernel library. For the XPU part, it looks good to me. However, we have to modify the CMakeList.txt to compile the newly added source files. I'm wondering if you could help review the PR as well. Thanks in advance!

@EikanWang EikanWang added the ciflow/xpu Run XPU CI tasks label Nov 28, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 28, 2025

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Nov 28, 2025
@EikanWang EikanWang added topic: not user facing topic category ciflow/xpu Run XPU CI tasks labels Nov 28, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 28, 2025

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Nov 28, 2025
@EikanWang EikanWang added ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks labels Nov 28, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 28, 2025

To add the ciflow label ciflow/trunk please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 28, 2025

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks labels Nov 28, 2025
@EikanWang EikanWang added ciflow/xpu Run XPU CI tasks ciflow/trunk Trigger trunk jobs on your pull request labels Nov 28, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 28, 2025

To add the ciflow label ciflow/trunk please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Nov 28, 2025
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@EikanWang
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 1, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: xpu / linux-jammy-xpu-n-1-py3.10 / build

Details for Dev Infra team Raised by workflow job

@pytorch-bot pytorch-bot bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks labels Dec 1, 2025
@LuFinch LuFinch requested a review from EikanWang December 1, 2025 14:55
@EikanWang
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 1, 2025
@EikanWang EikanWang added the ciflow/xpu Run XPU CI tasks label Dec 1, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx942.1)

Details for Dev Infra team Raised by workflow job

@EikanWang
Copy link
Collaborator

This PR only adds XPU sycl-tla based flash attention. It does not touch any ROCM code.

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx942.1)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Dec 3, 2025
…LA implementation (#167057)

This is a PR to upstream [SYCL-TLA](https://github.com/intel/sycl-tla) version FlashAttention for Pytorch XPU.

This is the second PR to register SYCL-TLA version FlashAttention forward/backward xpu kernels into SDPA's FlashAttention XPU backend.

PR stacks:
- #169101
- #167057

Currently, we support Intel Ponte Vecchio and Battlemage on Linux. In terms of other platform support, we are WIP.

Pull Request resolved: #167057
Approved by: https://github.com/EikanWang, https://github.com/drisspg

Co-authored-by: Eikan Wang <eikan.wang@intel.com>
JacobSzwejbka pushed a commit that referenced this pull request Dec 8, 2025
…TLA implementation (#169101)

This is a PR to utilize [SYCL-TLA](https://github.com/intel/sycl-tla)-based FlashAttention to accelerate `scaled_dot_product_attention` for Pytorch XPU.

PR stacks:
- #169101
- #167057
Pull Request resolved: #169101
Approved by: https://github.com/EikanWang, https://github.com/atalman
JacobSzwejbka pushed a commit that referenced this pull request Dec 8, 2025
…LA implementation (#167057)

This is a PR to upstream [SYCL-TLA](https://github.com/intel/sycl-tla) version FlashAttention for Pytorch XPU.

This is the second PR to register SYCL-TLA version FlashAttention forward/backward xpu kernels into SDPA's FlashAttention XPU backend.

PR stacks:
- #169101
- #167057

Currently, we support Intel Ponte Vecchio and Battlemage on Linux. In terms of other platform support, we are WIP.

Pull Request resolved: #167057
Approved by: https://github.com/EikanWang, https://github.com/drisspg

Co-authored-by: Eikan Wang <eikan.wang@intel.com>
@LuFinch LuFinch deleted the lfq/upstream_fa2_PR branch January 28, 2026 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants