[XPU] Add flash_attn2 support for XPU #41956

YangKai0616 · 2025-10-30T16:44:38Z

What does this PR do?

XPU has now implemented the basic functionality of flash_attn2 in kernels-community/flash-attn2. So this PR adds the function call to transformers.

Main contributions:

Added flash_attn2 support for XPU;
Enabled flash_attn2 UTs on XPU.

Now we can use attn_implementation="flash_attention_2" or attn_implementation="kernels-community/flash-attn" on XPU to invoke this feature.

Note (CI):

Some tests such as XXX::test_flash_attn_2_inference_equivalence and XXX::test_flash_attn_2_equivalence, etc. , will sometimes pass and sometimes fail due to some non-determinism in the flash-attn2 kernel implementation. I have observed this phenomenon on both CUDA (A100) and XPU.
For the test case tests/models/kosmos2/test_modeling_kosmos2.py::Kosmos2ModelTest::test_eager_matches_fa2_generate, CUDA will trigger a RuntimeError: cu_seqlens_q must have shape (batch_size + 1), while XPU will directly cause a Aborted (core dumped). This appears to be caused by the robustness of the underlying data reading function, but the root cause is the test case issue.

tests/models/glm4/test_modeling_glm4.py

github-actions · 2025-10-31T05:17:57Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: bamba, diffllama, ernie4_5_moe, esm, glm4, gpt2, jamba, jetmoe, kosmos2_5, m2m_100, ministral, mixtral, modernbert, seed_oss, zamba, zamba2

YangKai0616 · 2025-10-31T07:19:00Z

Once huggingface/kernels-community/pull/59 has been merged and the binary files has been uploaded to kernels-community/flash-attn2, we can begin the review. Cc @yao-matrix .

YangKai0616 added 3 commits October 24, 2025 03:12

Add flash_attention_2 and kernels-community/flash-attn support for XPU

6ea866d

Add flash-attn-2 support for XPU

120807f

Delete deterministic algorithm for xpu

2dbe3a6

yao-matrix reviewed Oct 30, 2025

View reviewed changes

tests/models/glm4/test_modeling_glm4.py Show resolved Hide resolved

YangKai0616 added 2 commits October 31, 2025 02:25

Fix code style

e577daf

Merge branch 'main' into main

bfca144

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XPU] Add flash_attn2 support for XPU #41956

[XPU] Add flash_attn2 support for XPU #41956

YangKai0616 commented Oct 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

YangKai0616 commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[XPU] Add flash_attn2 support for XPU #41956

Are you sure you want to change the base?

[XPU] Add flash_attn2 support for XPU #41956

Conversation

YangKai0616 commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

YangKai0616 commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YangKai0616 commented Oct 30, 2025 •

edited

Loading