[core] support sage attention + FA2 through `kernels` by sayakpaul · Pull Request #12439 · huggingface/diffusers

sayakpaul · 2025-10-06T05:48:06Z

What does this PR do?

Code to test (SAGE):

from diffusers import DiffusionPipeline 
import torch 

repo_id = "black-forest-labs/FLUX.1-dev"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16).to("cuda")
pipe.transformer.set_attention_backend("sage_hub")

image = pipe(
    prompt="a dog sitting by the sea, waiting for its companion to come",
    guidance_scale=3.5,
    num_inference_steps=30,
    max_sequence_length=512,
    generator=torch.manual_seed(0)
).images[0]
image.save("sage_flux.png")

Result:

FA2:

import torch
from diffusers import FluxPipeline

model_id = "black-forest-labs/FLUX.1-dev"
pipe = FluxPipeline.from_pretrained(
    model_id, torch_dtype=torch.bfloat16
).to("cuda")

pipe.transformer.set_attention_backend("flash_hub")
pipe.transformer.compile(fullgraph=True)

prompt = "A cat holding a sign that says 'hello world'"

with torch._dynamo.config.patch(error_on_recompile=True):
    image = pipe(
        prompt, num_inference_steps=28, guidance_scale=4.0, generator=torch.manual_seed(0)
    ).images[0]
    image.save("output.png")

Notes

It would be nice to get torch.compile support when using sage attention like we have for flash and flash 3. Currently, this fails.

Code to test

from diffusers import DiffusionPipeline 
import torch 

repo_id = "black-forest-labs/FLUX.1-dev"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16).to("cuda")
pipe.transformer.set_attention_backend("sage_hub")
pipe.transformer.compile_repeated_blocks(fullgraph=True)

with (
    torch._inductor.utils.fresh_inductor_cache(),
    torch._dynamo.config.patch(error_on_recompile=True),
):
    image = pipe(
        prompt="a dog sitting by the sea, waiting for its companion to come",
        guidance_scale=3.5,
        num_inference_steps=30,
        max_sequence_length=512,
        generator=torch.manual_seed(0)
    ).images[0]
image.save("sage_flux.png")

Error: https://pastebin.com/3HS6HNzR

We have other sageattn variants (see here), which would be cool to expose from the Hub kernel.

Cc: @MekkCyber

HuggingFaceDocBuilderDev · 2025-10-06T05:56:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

MekkCyber

Very cool ! I will try to look into the torch compile compatibility, but for the other variants, they are the same as sageattn, what i mean is sageattn is just a wrapper that dispatches to the correct kernel depending on the hardware used : https://github.com/thu-ml/SageAttention/blob/main/sageattention/core.py#L140

sayakpaul · 2025-10-06T13:44:54Z

they are the same as sageattn, what i mean is sageattn is just a wrapper that dispatches to the correct kernel depending on the hardware used :

So, you mean we shouldn't have to have different dispatched functions like this?

diffusers/src/diffusers/models/attention_dispatch.py

Line 194 in ce90f9b

_SAGE_QK_INT8_PV_FP8_CUDA = "_sage_qk_int8_pv_fp8_cuda"

MekkCyber · 2025-10-06T14:45:44Z

Yes I think we don't need that because it depends on the hardware. For example if a user chooses : _sage_qk_int8_pv_fp8_cuda on A100 (8.0) it will fail, because this function is only supported and compiled for 8.9 gpus

sayakpaul · 2025-10-07T13:39:18Z

src/diffusers/models/attention_dispatch.py

-_SAGE_ATTENTION_PV_ACCUM_DTYPE = Literal["fp32", "fp32+fp32"]
-_SAGE_ATTENTION_QK_QUANT_GRAN = Literal["per_thread", "per_warp"]
-_SAGE_ATTENTION_QUANTIZATION_BACKEND = Literal["cuda", "triton"]


I don't see their usage, hence removed.

woct0rdho · 2025-10-10T10:02:42Z

FYI, I've ported SageAttention to Python stable ABI (ABI3) and libtorch stable ABI, which should simplify building for HF Kernels:
woct0rdho/SageAttention@main...abi3_stable

There are also some refactors in my main branch to simplify building. If someone can maintain the build system, then I no longer need to maintain my repo :)

sayakpaul · 2025-10-13T10:29:37Z

This PR is ready to be reviewed now. As discussed with @MekkCyber over DMs, we're disabling torch.compile support now as the compile branch leads to garbage outputs.

In order for us to support it with torch.compile, some kind of lightweight dispatcher might be needed. d344134 added support for that but I have removed it for now for the above-mentioned purpos. Those changes are still safe in sage-kernels-dispatch branch.

I think we should be good with the PR.

Cc: @MekkCyber @DN6

src/diffusers/models/attention_dispatch.py

sayakpaul · 2025-11-20T07:21:33Z

@DN6 it should be up for another review. I have updated the test suite and ensured that they pass successfully as well. PTAL.

DN6

Good to merge. But we need to remove the parallel config check and use supports_context_parallel=False

DN6 · 2025-11-24T10:27:00Z

src/diffusers/models/attention_dispatch.py

+    return_lse: bool = False,
+    _parallel_config: Optional["ParallelConfig"] = None,
+) -> torch.Tensor:
+    if _parallel_config:


Use:

@_AttentionBackendRegistry.register( AttentionBackendName.FLASH_HUB, constraints=[_check_device, _check_qkv_dtype_bf16_or_fp16, _check_shape], supports_context_parallel=False, )

diffusers/src/diffusers/models/attention_dispatch.py

Line 1307 in 544ba67

supports_context_parallel=True,

It will raise an error when trying to enable parallelism with this backend. This check isn't needed.

up

e9ea1c5

sayakpaul added the performance Anything related to performance improvements, profiling and benchmarking label Oct 6, 2025

sayakpaul requested a review from DN6 October 6, 2025 05:48

MekkCyber reviewed Oct 6, 2025

View reviewed changes

Merge branch 'main' into sage-kernels

f630dab

StrongerXi mentioned this pull request Oct 6, 2025

Add torch.compile support to SageAttention thu-ml/SageAttention#218

Merged

sayakpaul added 2 commits October 7, 2025 14:59

Merge branch 'main' into sage-kernels

18c3e8e

support automatic dispatch.

d344134

sayakpaul marked this pull request as draft October 7, 2025 13:38

sayakpaul commented Oct 7, 2025

View reviewed changes

Merge branch 'main' into sage-kernels

3688c9d

Merge branch 'main' into sage-kernels

23c173e

sayakpaul mentioned this pull request Oct 13, 2025

[core] Refactor hub attn kernels #12475

Merged

disable compile support for now./

1b1a497

sayakpaul marked this pull request as ready for review October 13, 2025 10:27

sayakpaul added 2 commits October 13, 2025 16:28

up

177f5b3

Merge branch 'main' into sage-kernels

4e62a44

DN6 reviewed Oct 22, 2025

View reviewed changes

src/diffusers/models/attention_dispatch.py Outdated Show resolved Hide resolved

sayakpaul mentioned this pull request Oct 27, 2025

Add AITER attention backend #12549

Merged

6 tasks

sayakpaul added 3 commits November 20, 2025 10:35

resolve conflicts.

bd34d93

flash too.

f9e9bc3

document.

7f9f826

sayakpaul changed the title ~~[core] support sage attention through kernels~~ [core] support sage attention + FA2 through kernels Nov 20, 2025

sayakpaul mentioned this pull request Nov 20, 2025

[core] support flash attention through kernels #12387

Closed

sayakpaul requested a review from DN6 November 20, 2025 06:54

up

bcfeb8d

sayakpaul added 2 commits November 20, 2025 13:10

up

5e4ab62

up

d4ecaaf

sayakpaul mentioned this pull request Nov 22, 2025

Fix qwen encoder hidden states mask #12655

Closed

4 tasks

Merge branch 'main' into sage-kernels

f89889b

DN6 approved these changes Nov 24, 2025

View reviewed changes

sayakpaul added 3 commits November 24, 2025 16:36

Merge branch 'main' into sage-kernels

823ee2c

up

8cedc09

Merge branch 'main' into sage-kernels

b55768a

sayakpaul merged commit d176f61 into main Nov 24, 2025
15 checks passed

sayakpaul deleted the sage-kernels branch November 24, 2025 11:30

ParagEkbote mentioned this pull request Dec 4, 2025

Support flash-attn kernel support for non-Hopper GPUs #12308

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] support sage attention + FA2 through `kernels`#12439

[core] support sage attention + FA2 through `kernels`#12439
sayakpaul merged 19 commits intomainfrom
sage-kernels

sayakpaul commented Oct 6, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Oct 6, 2025

Uh oh!

MekkCyber left a comment

Uh oh!

sayakpaul commented Oct 6, 2025

Uh oh!

MekkCyber commented Oct 6, 2025

Uh oh!

sayakpaul Oct 7, 2025

Uh oh!

woct0rdho commented Oct 10, 2025 •

edited

Loading

Uh oh!

sayakpaul commented Oct 13, 2025

Uh oh!

Uh oh!

sayakpaul commented Nov 20, 2025

Uh oh!

DN6 left a comment

Uh oh!

DN6 Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

sayakpaul commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Notes

Uh oh!

HuggingFaceDocBuilderDev commented Oct 6, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Oct 6, 2025

Uh oh!

MekkCyber commented Oct 6, 2025

Uh oh!

sayakpaul Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

woct0rdho commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Oct 13, 2025

Uh oh!

Uh oh!

sayakpaul commented Nov 20, 2025

Uh oh!

DN6 left a comment

Choose a reason for hiding this comment

Uh oh!

DN6 Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sayakpaul commented Oct 6, 2025 •

edited

Loading

woct0rdho commented Oct 10, 2025 •

edited

Loading