Dao-AILab / flash-attention Public

Notifications You must be signed in to change notification settings
Fork 2.3k
Star 22.1k

Code
Issues 939
Pull requests 111
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: Dao-AILab/flash-attention

Labels 9 Milestones 0

New pull request New

111 Open 428 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[AMD] Migrate to Triton Backend to Aiter

#2230 opened Feb 4, 2026 by micmelesse • Draft

Nicer headdim error message

#2227 opened Feb 4, 2026 by drisspg

Loading…

[WIP] varlen blocksparsity

#2224 opened Feb 2, 2026 by reubenconducts • Draft

[FA3] Mark current main version as v3.0.0 stable

#2223 opened Feb 2, 2026 by lw

Loading…

[Draft][Cute,Fwd,Sm120] FA Cute DSL sm12x

#2222 opened Feb 2, 2026 by johnnynunez • Draft

[Ai-assisted] CLC work stealing

#2218 opened Jan 31, 2026 by drisspg

Loading…

[ROCM] Add support with Infinity Cache (LLC) awareness for performance improvement - [PR#2147 rebased on PR#2178]

#2217 opened Jan 29, 2026 by tianwyan

Loading…

[CUTE]Bump to Cutedsl

#2216 opened Jan 29, 2026 by drisspg • Draft

[Cute, Flex, Fwd, Sm100] Allow vectorized score_mod definitions

#2215 opened Jan 28, 2026 by reubenconducts

Loading…

Add shift scheduler for deterministic full‑mask FA3 bwd on Hopper (sm90)

#2207 opened Jan 23, 2026 by tie-pilot-qxw

Loading…

Add loc info & Fix api changes for CuTeDSL 4.4

#2204 opened Jan 23, 2026 by keithzzzzz

Loading…

[Cute, SM100, BWD] Refactor get_n_block_max_for_m_block into a method of BlockInfo

#2203 opened Jan 23, 2026 by henrylhtsang

Loading…

BWD sm100 2cta

#2202 opened Jan 23, 2026 by tzadouri

Loading…

[Cute, SM100] Fix comment in tmem_p_offset

#2201 opened Jan 22, 2026 by Edenzzzz

Loading…

Warn when ninja is missing

#2191 opened Jan 17, 2026 by blueberrycongee

Loading…

Fix compute_block_sparsity import in benchmark_mask_mod

#2190 opened Jan 17, 2026 by blueberrycongee

Loading…

[Cute][Testing] Protyping a fast test mode for Cute

#2188 opened Jan 16, 2026 by drisspg

Loading…

[Cute,Fwd,Sm100] support irregular qhead / kvhead ratios

#2186 opened Jan 16, 2026 by timmy-feng • Draft

[Cute] Add torch.compile support for FA4

#2164 opened Jan 9, 2026 by gilfordting

Loading…

Update mha_fwd.cpp, Normalize the commented-out parameters

#2160 opened Jan 9, 2026 by breakfei

Loading…

Update schema in test_flash3_bw_compatibility

#2153 opened Jan 8, 2026 by guilhermeleobas • Draft

[Cute] Update deprecated cute DSL APIs

#2148 opened Jan 7, 2026 by henrylhtsang

Loading…

Add FLASH_ATTENTION_FORCE_NON_STABLE_API option to allow building on NVidia Pytorch 25.09 image

#2140 opened Jan 5, 2026 by jp-gr

Loading…

[ROCM] Fix AMD Triton backend crash when dropout != 0 and return_attn_probs = False

#2111 opened Dec 30, 2025 by Logiquo

Loading…

[Cute,Fwd,Sm100] fp8 e4m3 and e5m2 support

#2109 opened Dec 29, 2025 by dcw02

Loading…

Previous 1 2 3 4 5 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!