Split attention backends by dxqb · Pull Request #12870 · huggingface/diffusers

dxqb · 2025-12-21T21:28:02Z

What does this PR do?

Most recent models have been using variable lengths captions (Qwen, Chroma, Z-Image, ...) and require attention masking if batch size > 1 with multiple captions.
torch SDPA only uses its internal flash attention algorithm if there is no attention mask. Otherwise it falls back to another algorithm that is significantly slower, especially for high sequence lengths.

This PR implements an attention backend that splits up the attention batch into individual samples. Even though attention has to be called multiple times then, it is still faster than masked attention (tested up to batch size 8).

This PR also lays the groundwork for efficiently using "flash varlen" and other varlen attention backends, which are already implemented but not efficiently (see code comment).

This PR is based on @kashif and @cdutr 's work in this PR: #12702

Benchmarks

Training benchmarks using OneTrainer: especially training in higher resolution benefits:

Inference benchmark using diffusers Qwen example script (but with regional compilation):
Inference benefits when comparing apples to apples, which is BS2 for CFG. However, the current pipeline already avoids attention masks by calling the transformer twice with BS1, so there is only a slight practical improvement for inference:

Who can review?

@yiyixuxu and @sayakpaul
CC @kashif and @cdutr for feedback

contains #12892

- Remove seq_lens parameter from dispatch_attention_fn - Update varlen backends to extract seqlens from masks - Update QwenImage to pass 2D joint_attention_mask - Fix native backend to handle 2D boolean masks - Fix sage_varlen seqlens_q to match seqlens_k for self-attention Note: sage_varlen still producing black images, needs further investigation

…to txt_seq_lens

Co-authored-by: YiYi Xu <yixu310@gmail.com>

Enhances documentation with comprehensive performance insights for QwenImage pipeline:

dxqb · 2025-12-26T17:06:36Z

added backends to split flash attention. This is useful especially for Windows users, because native torch SDPA doesn't use flash internally on Windows
tried to add Z-Image support but ran into this issue: Z-Image text sequence length issue #12893

github-actions · 2026-01-21T15:06:50Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

dxqb · 2026-01-21T20:52:54Z

Yes.

Next steps:

I'll merge with main when [Qwen] avoid creating attention masks when there is no padding #12987 is merged
ask for review then

dxqb · 2026-02-13T12:52:01Z

src/diffusers/pipelines/qwenimage/pipeline_qwenimage.py

        callback_on_step_end: Callable[[int, int], None] | None = None,
        callback_on_step_end_tensor_inputs: list[str] = ["latents"],
        max_sequence_length: int = 512,
+        batch_negative: bool = False, #TODO remove, only for testing


all changes in this file are only for testing - should be reverted before merge

dxqb · 2026-02-13T12:52:40Z

Yes.

Next steps:

* I'll merge with main when [[Qwen] avoid creating attention masks when there is no padding #12987](https://github.com/huggingface/diffusers/pull/12987) is merged

* ask for review then

done

dxqb · 2026-02-14T06:23:51Z

src/diffusers/models/attention_dispatch.py

    cu_seqlens_q[1:] = torch.cumsum(seqlens_q, dim=0)
    cu_seqlens_k[1:] = torch.cumsum(seqlens_k, dim=0)
-    max_seqlen_q = seqlens_q.max().item()
+    max_seqlen_q = seqlens_q.max().item() #TODO item() is inefficient and breaks torch.compile graphs. Use 'seq_len' parameter instead (see split attention backend)


I have benchmarked the new varlen attention in torch 2.10:
https://docs.pytorch.org/docs/stable/nn.attention.varlen.html

Performance isn't bad, but in my test case, split attention was always faster - if you considered the preparation of the varlen tokens. Even with torch.compiling the preparation.

kashif and others added 30 commits November 23, 2025 18:02

Fix QwenImage txt_seq_lens handling

b547fcf

formatting

72a80c6

formatting

88cee8b

remove txt_seq_lens and use bool mask

ac5ac24

Merge branch 'main' into txt_seq_lens

0477526

use compute_text_seq_len_from_mask

18efdde

add seq_lens to dispatch_attention_fn

6a549d4

use joint_seq_lens

2d424e0

remove unused index_block

30b5f98

Merge branch 'main' into txt_seq_lens

588dc04

Merge branch 'txt_seq_lens' of https://github.com/kashif/diffusers in…

ec52417

…to txt_seq_lens

fix formatting

beeb020

undo sage changes

5c6f8e3

xformers support

5d434f6

hub fix

71ba603

Merge branch 'main' into txt_seq_lens

babf490

fix torch compile issues

afad335

Merge branch 'main' into txt_seq_lens

2d5ab16

fix tests

c78a1e9

use _prepare_attn_mask_native

d6d4b1d

proper deprecation notice

e999b76

add deprecate to txt_seq_lens

8115f0b

Update src/diffusers/models/transformers/transformer_qwenimage.py

3b1510c

Co-authored-by: YiYi Xu <yixu310@gmail.com>

Update src/diffusers/models/transformers/transformer_qwenimage.py

3676d8e

Co-authored-by: YiYi Xu <yixu310@gmail.com>

Only create the mask if there's actual padding

9ed0ffd

Merge branch 'main' into txt_seq_lens

abec461

fix order of docstrings

e26e7b3

Adds performance benchmarks and optimization details for QwenImage

59e3882

Enhances documentation with comprehensive performance insights for QwenImage pipeline:

Merge branch 'main' into txt_seq_lens

0cb2138

fix error if no attn mask is passed

66056f1

dxqb mentioned this pull request Dec 21, 2025

Split attention Nerogar/OneTrainer#1218

Draft

1 task

dxqb and others added 8 commits December 23, 2025 16:03

Merge branch 'main' into split_attention

0a713d1

Merge remote-tracking branch 'origin/main' into pr-12702-base

b9880f6

Merge branch 'pr-12702-base' into split_attention

a8bba06

check attention mask

5eef3ef

Merge branch 'check_attn_mask' into split_attention

e593603

Merge branch 'main' into pr-12702-base

23e7a65

Merge branch 'pr-12702-base' into split_attention

0584542

more backends

7651363

dxqb changed the title ~~Split attention backend~~ Split attention backends Dec 26, 2025

dxqb added 2 commits December 26, 2025 18:11

bugfix

cc134a7

bugfix

7e456cd

dxqb mentioned this pull request Jan 19, 2026

[Qwen] avoid creating attention masks when there is no padding #12987

Merged

6 tasks

github-actions bot added the stale Issues that haven't received updates label Jan 21, 2026

github-actions bot removed the stale Issues that haven't received updates label Jan 22, 2026

dxqb added 3 commits February 13, 2026 10:29

merge

c90289e

merge

f60e9cf

merge

5bf6698

dxqb commented Feb 13, 2026

View reviewed changes

dxqb marked this pull request as ready for review February 13, 2026 12:52

dxqb added 4 commits February 13, 2026 14:03

fix: remove obsolete argument

b38372c

add checks

b40acf2

fix type

3957728

fix type hint

a52a5c9

dxqb commented Feb 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split attention backends#12870

Split attention backends#12870
dxqb wants to merge 70 commits intohuggingface:mainfrom
dxqb:split_attention

dxqb commented Dec 21, 2025 •

edited

Loading

Uh oh!

dxqb commented Dec 26, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

dxqb commented Jan 21, 2026

Uh oh!

dxqb Feb 13, 2026

Uh oh!

dxqb commented Feb 13, 2026

Uh oh!

dxqb Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dxqb commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Benchmarks

Who can review?

Uh oh!

dxqb commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

dxqb commented Jan 21, 2026

Uh oh!

dxqb Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

dxqb commented Feb 13, 2026

Uh oh!

dxqb Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dxqb commented Dec 21, 2025 •

edited

Loading

dxqb commented Dec 26, 2025 •

edited

Loading