Skip to content

Commit 1b4c25c

Browse files
authored
chore: add params modifier docs (vipshop#347)
* chore: add params modifier docs * chore: add params modifier docs * chore: add params modifier docs
1 parent e719ff0 commit 1b4c25c

File tree

4 files changed

+81
-115
lines changed

4 files changed

+81
-115
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,7 @@ For more advanced features such as **Unified Cache APIs**, **Forward Pattern Mat
204204
- [📚Hybrid Forward Pattern](./docs/User_Guide.md#hybrid-forward-pattern)
205205
- [📚Implement Patch Functor](./docs/User_Guide.md#implement-patch-functor)
206206
- [📚Transformer-Only Interface](./docs/User_Guide.md#transformer-only-interface)
207+
- [📚How to use ParamsModifier](./docs/User_Guide.md#how-to-use-paramsmodifier)
207208
- [🤖Cache Acceleration Stats](./docs/User_Guide.md#cache-acceleration-stats-summary)
208209
- [⚡️DBCache: Dual Block Cache](./docs/User_Guide.md#️dbcache-dual-block-cache)
209210
- [⚡️DBPrune: Dynamic Block Prune](./docs/User_Guide.md#️dbprune-dynamic-block-prune)

docs/User_Guide.md

Lines changed: 76 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,20 @@
1+
<div align="center">
2+
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/cache-dit-logo.png height="120">
3+
<p align="center">
4+
A <b>Unified</b>, Flexible and Training-free <b>Cache Acceleration</b> Framework for <b>🤗Diffusers</b> <br>
5+
♥️ Cache Acceleration with <b>One-line</b> Code ~ ♥️ <br>
6+
</p>
7+
<div align='center'>
8+
<img src=https://img.shields.io/badge/Language-Python-brightgreen.svg >
9+
<img src=https://img.shields.io/badge/PyPI-pass-brightgreen.svg >
10+
<a href="https://pepy.tech/projects/cache-dit"><img src=https://static.pepy.tech/personalized-badge/cache-dit?period=total&units=INTERNATIONAL_SYSTEM&left_color=GRAY&right_color=GREEN&left_text=downloads></a>
11+
<img src=https://img.shields.io/github/issues/vipshop/cache-dit.svg >
12+
<img src=https://img.shields.io/github/stars/vipshop/cache-dit.svg?style=dark >
13+
</div>
14+
<p align="center">
15+
🎉Now, <b>cache-dit</b> covers almost <b>All</b> Diffusers' <b>DiT</b> Pipelines🎉
16+
</div>
17+
118
## 📖User Guide
219

320
<div id="contents"></div>
@@ -12,10 +29,11 @@
1229
- [📚Hybrid Forward Pattern](#automatic-block-adapter)
1330
- [📚Implement Patch Functor](#implement-patch-functor)
1431
- [📚Transformer-Only Interface](#transformer-only-interface)
32+
- [📚How to use ParamsModifier](#how-to-use-paramsmodifier)
1533
- [🤖Cache Acceleration Stats](#cache-acceleration-stats-summary)
1634
- [⚡️DBCache: Dual Block Cache](#dbcache)
1735
- [⚡️DBPrune: Dynamic Block Prune](#dbprune)
18-
- [⚡️Hybrid Hybrid Cache CFG](#cfg)
36+
- [⚡️Hybrid Cache CFG](#cfg)
1937
- [🔥Hybrid TaylorSeer Calibrator](#taylorseer)
2038
- [⚡️Hybrid Context Parallelism](#context-parallelism)
2139
- [⚡️Hybrid Tensor Parallelism](#tensor-parallelism)
@@ -31,7 +49,7 @@
3149
You can install the stable release of `cache-dit` from PyPI:
3250

3351
```bash
34-
pip3 install -U cache-dit
52+
pip3 install -U cache-dit # or, pip3 install -U "cache-dit[all]" for all features
3553
```
3654
Or you can install the latest develop version from GitHub:
3755

@@ -48,10 +66,11 @@ Currently, **cache-dit** library supports almost **Any** Diffusion Transformers
4866
```python
4967
>>> import cache_dit
5068
>>> cache_dit.supported_pipelines()
51-
(30, ['Flux*', 'Mochi*', 'CogVideoX*', 'Wan*', 'HunyuanVideo*', 'QwenImage*', 'LTX*', 'Allegro*',
69+
(32, ['Flux*', 'Mochi*', 'CogVideoX*', 'Wan*', 'HunyuanVideo*', 'QwenImage*', 'LTX*', 'Allegro*',
5270
'CogView3Plus*', 'CogView4*', 'Cosmos*', 'EasyAnimate*', 'SkyReelsV2*', 'StableDiffusion3*',
5371
'ConsisID*', 'DiT*', 'Amused*', 'Bria*', 'Lumina*', 'OmniGen*', 'PixArt*', 'Sana*', 'StableAudio*',
54-
'VisualCloze*', 'AuraFlow*', 'Chroma*', 'ShapE*', 'HiDream*', 'HunyuanDiT*', 'HunyuanDiTPAG*'])
72+
'VisualCloze*', 'AuraFlow*', 'Chroma*', 'ShapE*', 'HiDream*', 'HunyuanDiT*', 'HunyuanDiTPAG*',
73+
'Kandinsky5*', 'PRX*'])
5574
```
5675

5776
<details>
@@ -186,13 +205,10 @@ from diffusers import DiffusionPipeline
186205

187206
# Can be any diffusion pipeline
188207
pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image")
189-
190208
# One-line code with default cache options.
191209
cache_dit.enable_cache(pipe)
192-
193210
# Just call the pipe as normal.
194211
output = pipe(...)
195-
196212
# Disable cache and run original pipe.
197213
cache_dit.disable_cache(pipe)
198214
```
@@ -275,13 +291,13 @@ cache_dit.enable_cache(
275291
# value will be overwrite by the new one.
276292
params_modifiers=[
277293
ParamsModifier(
278-
cache_config=DBCacheConfig(
294+
cache_config=DBCacheConfig().reset(
279295
max_warmup_steps=4,
280296
max_cached_steps=8,
281297
),
282298
),
283299
ParamsModifier(
284-
cache_config=DBCacheConfig(
300+
cache_config=DBCacheConfig().reset(
285301
max_warmup_steps=2,
286302
max_cached_steps=20,
287303
),
@@ -291,6 +307,7 @@ cache_dit.enable_cache(
291307
),
292308
)
293309
```
310+
294311
### 📚Implement Patch Functor
295312

296313
For any PATTERN not in {0...5}, we introduced the simple abstract concept of **Patch Functor**. Users can implement a subclass of Patch Functor to convert an unknown Pattern into a known PATTERN, and for some models, users may also need to fuse the operations within the blocks for loop into block forward.
@@ -341,6 +358,50 @@ cache_dit.enable_cache(
341358
)
342359
```
343360

361+
### 📚How to use ParamsModifier
362+
363+
Sometimes you may encounter more complex cases, such as **Wan 2.2 MoE**, which has more than one Transformer (namely `transformer` and `transformer_2`), or FLUX.1, which has multiple transformer blocks (namely `single_transformer_blocks` and `transformer_blocks`). cache-dit will assign separate cache contexts for different `blocks` instances but share the same `cache_config` by default. Users who want to achieve fine-grained control over different cache contexts can consider using `ParamsModifier`. Just pass the `ParamsModifier` per `blocks` to the `BlockAdapter` or `enable_cache(...)` API. Then, the shared `cache_config` will be overwritten by the new configurations from the `ParamsModifier`. For example:
364+
365+
```python
366+
from cache_dit import ParamsModifier
367+
368+
cache_dit.enable_cache(
369+
BlockAdapter(
370+
pipe=pipe, # FLUX.1, etc.
371+
transformer=pipe.transformer,
372+
blocks=[
373+
pipe.transformer.transformer_blocks,
374+
pipe.transformer.single_transformer_blocks,
375+
],
376+
forward_pattern=[
377+
ForwardPattern.Pattern_1,
378+
ForwardPattern.Pattern_3,
379+
],
380+
),
381+
# Basic shared cache config
382+
cache_config=DBCacheConfig(...),
383+
params_modifiers=[
384+
ParamsModifier(
385+
# Modified config only for transformer_blocks
386+
# Must call the `reset` method of DBCacheConfig.
387+
cache_config=DBCacheConfig().reset(
388+
Fn_compute_blocks=8,
389+
residual_diff_threshold=0.08,
390+
),
391+
),
392+
ParamsModifier(
393+
# Modified config only for single_transformer_blocks
394+
# NOTE: FLUX.1, single_transformer_blocks should have `higher`
395+
# residual_diff_threshold because of the precision error
396+
# accumulation from previous transformer_blocks
397+
cache_config=DBCacheConfig().reset(
398+
Fn_compute_blocks=1,
399+
residual_diff_threshold=0.16,
400+
),
401+
),
402+
],
403+
)
404+
```
344405

345406
### 🤖Cache Acceleration Stats Summary
346407

@@ -407,7 +468,7 @@ cache_dit.enable_cache(
407468
|Baseline(L20x1)|F1B0 (0.08)|F1B0 (0.20)|F8B8 (0.15)|F12B12 (0.20)|F16B16 (0.20)|
408469
|:---:|:---:|:---:|:---:|:---:|:---:|
409470
|24.85s|15.59s|8.58s|15.41s|15.11s|17.74s|
410-
|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F1B0S1_R0.08_S11.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F1B0S1_R0.2_S19.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F8B8S1_R0.15_S15.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F12B12S4_R0.2_S16.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F16B16S4_R0.2_S13.png width=105px>|
471+
|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=140px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F1B0S1_R0.08_S11.png width=140px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F1B0S1_R0.2_S19.png width=140px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F8B8S1_R0.15_S15.png width=140px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F12B12S4_R0.2_S16.png width=140px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F16B16S4_R0.2_S13.png width=140px>|
411472

412473
## ⚡️DBPrune: Dynamic Block Prune
413474

@@ -441,7 +502,7 @@ cache_dit.enable_cache(
441502
Bn_compute_blocks=8, # Bn, B8, etc
442503
residual_diff_threshold=0.12,
443504
enable_dynamic_prune_threshold=True,
444-
non_prune_block_ids=list(range(16)),
505+
non_prune_block_ids=list(range(16,24)),
445506
),
446507
)
447508
```
@@ -454,7 +515,7 @@ cache_dit.enable_cache(
454515
|Baseline(L20x1)|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
455516
|:---:|:---:|:---:|:---:|:---:|:---:|
456517
|24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
457-
|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.03_P24.0_T19.43s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.04_P34.6_T16.82s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.05_P38.3_T15.95s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.06_P45.2_T14.24s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.2_P59.5_T10.66s.png width=105px>|
518+
|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=140px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.03_P24.0_T19.43s.png width=140px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.04_P34.6_T16.82s.png width=140px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.05_P38.3_T15.95s.png width=140px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.06_P45.2_T14.24s.png width=140px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.2_P59.5_T10.66s.png width=140px>|
458519

459520
## ⚡️Hybrid Cache CFG
460521

@@ -531,7 +592,7 @@ cache_dit.enable_cache(
531592
|Baseline(L20x1)|F1B0 (0.12)|+TaylorSeer|F1B0 (0.15)|+TaylorSeer|+compile|
532593
|:---:|:---:|:---:|:---:|:---:|:---:|
533594
|24.85s|12.85s|12.86s|10.27s|10.28s|8.48s|
534-
|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T0ET0_R0.12_S14_T12.85s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T1ET1_R0.12_S14_T12.86s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T0ET0_R0.15_S17_T10.27s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T1ET1_R0.15_S17_T10.28s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBCACHE_F1B0S1W0T1ET1_R0.15_S17_T8.48s.png width=105px>|
595+
|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=140px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T0ET0_R0.12_S14_T12.85s.png width=140px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T1ET1_R0.12_S14_T12.86s.png width=140px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T0ET0_R0.15_S17_T10.27s.png width=140px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T1ET1_R0.15_S17_T10.28s.png width=140px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBCACHE_F1B0S1W0T1ET1_R0.15_S17_T8.48s.png width=140px>|
535596

536597

537598
## ⚡️Hybrid Context Parallelism
@@ -743,9 +804,8 @@ This function seamlessly integrates with both standard diffusion pipelines and c
743804
Whether to use separate cfg or not, such as in Wan 2.1, Qwen-Image. For models that fuse CFG and non-CFG into a single forward step, set enable_separate_cfg as False. Examples include: CogVideoX, HunyuanVideo, Mochi, etc.
744805
- `cfg_compute_first`: (`bool`, *required*, defaults to False):
745806
Whether to compute cfg forward first, default is False, meaning:
746-
0, 2, 4, ... -> non-CFG step;
747-
1, 3, 5, ... -> CFG step.
748-
- `cfg_diff_compute_separate`: (`bool`, *required*, defaults to True):
807+
0, 2, 4, ... -> non-CFG step; 1, 3, 5, ... -> CFG step.
808+
- `cfg_diff_compute_separate`: (`bool`, *required*, defaults to True):
749809
Whether to compute separate difference values for CFG and non-CFG steps, default is True. If False, we will use the computed difference from the current non-CFG transformer step for the current CFG step.
750810
- `num_inference_steps` (`int`, *optional*, defaults to None):
751811
num_inference_steps for DiffusionPipeline, used to adjust some internal settings

examples/parallelism/.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
tmp
2+
*.png
3+
*.mp4
4+
__pycache__

examples/parallelism/run_kandinsky5_t2v_cp.py

Lines changed: 0 additions & 99 deletions
This file was deleted.

0 commit comments

Comments
 (0)