Skip to content

Commit 28a0ef1

Browse files
authored
docs: add parallelism docs (vipshop#345)
1 parent 4533116 commit 28a0ef1

File tree

1 file changed

+35
-7
lines changed

1 file changed

+35
-7
lines changed

docs/User_Guide.md

Lines changed: 35 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -538,7 +538,7 @@ cache_dit.enable_cache(
538538

539539
<div id="context-parallelism"></div>
540540

541-
cache-dit is compatible with context parallelism. Currently, we support the use of `Hybrid Cache` + `Context Parallelism` scheme (via NATIVE_DIFFUSER parallelism backend) in cache-dit. Users can use Context Parallelism to further accelerate the speed of inference! For more details, please refer to [📚examples/parallelism](https://github.com/vipshop/cache-dit/tree/main/examples/parallelism).
541+
cache-dit is compatible with context parallelism. Currently, we support the use of `Hybrid Cache` + `Context Parallelism` scheme (via NATIVE_DIFFUSER parallelism backend) in cache-dit. Users can use Context Parallelism to further accelerate the speed of inference! For more details, please refer to [📚examples/parallelism](https://github.com/vipshop/cache-dit/tree/main/examples/parallelism). Currently, cache-dit supported context parallelism for [FLUX.1](https://huggingface.co/black-forest-labs/FLUX.1-dev), [Qwen-Image](https://github.com/QwenLM/Qwen-Image), [LTXVideo](https://huggingface.co/Lightricks/LTX-Video), [Wan2.1](https://github.com/Wan-Video/Wan2.1) and [Wan2.2](https://github.com/Wan-Video/Wan2.2). cache-dit will support more models in the future.
542542

543543
```python
544544
# pip3 install "cache-dit[parallelism]"
@@ -557,7 +557,7 @@ cache_dit.enable_cache(
557557

558558
<div id="tensor-parallelism"></div>
559559

560-
cache-dit is also compatible with tensor parallelism. Currently, we support the use of `Hybrid Cache` + `Tensor Parallelism` scheme (via NATIVE_PYTORCH parallelism backend) in cache-dit. Users can use Tensor Parallelism to further accelerate the speed of inference and **reduce the VRAM usage per GPU**! For more details, please refer to [📚examples/parallelism](https://github.com/vipshop/cache-dit/tree/main/examples/parallelism).
560+
cache-dit is also compatible with tensor parallelism. Currently, we support the use of `Hybrid Cache` + `Tensor Parallelism` scheme (via NATIVE_PYTORCH parallelism backend) in cache-dit. Users can use Tensor Parallelism to further accelerate the speed of inference and **reduce the VRAM usage per GPU**! For more details, please refer to [📚examples/parallelism](https://github.com/vipshop/cache-dit/tree/main/examples/parallelism). Currently, cache-dit supported tensor parallelism for [FLUX.1](https://huggingface.co/black-forest-labs/FLUX.1-dev), [Qwen-Image](https://github.com/QwenLM/Qwen-Image), [Wan2.1](https://github.com/Wan-Video/Wan2.1) and [Wan2.2](https://github.com/Wan-Video/Wan2.2). cache-dit will support more models in the future.
561561

562562
```python
563563
# pip3 install "cache-dit[parallelism]"
@@ -572,7 +572,8 @@ cache_dit.enable_cache(
572572
# torchrun --nproc_per_node=2 parallel_cache.py
573573
```
574574

575-
Please note that in the short term, we have no plans to support Hybrid Parallelism. Please choose to use either Context Parallelism or Tensor Parallelism based on your actual scenario.
575+
> [!Important]
576+
> Please note that in the short term, we have no plans to support Hybrid Parallelism. Please choose to use either Context Parallelism or Tensor Parallelism based on your actual scenario.
576577
577578
## 🤖Low-bits Quantization
578579

@@ -587,8 +588,8 @@ import cache_dit
587588
cache_dit.enable_cache(pipe_or_adapter)
588589

589590
# float8, float8_weight_only, int8, int8_weight_only, int4, int4_weight_only
590-
# int4_weight_only required `fbgemm-gpu-genai>=1.2.0`, which is only support
591-
# Compute Arch >= Hopper (not support for Ada, Ampere, ..., etc.)
591+
# int4_weight_only requires fbgemm-gpu-genai>=1.2.0, which only supports
592+
# Compute Architectures >= Hopper (and does not support Ada, ..., etc.)
592593
pipe.transformer = cache_dit.quantize(
593594
pipe.transformer, quant_type="float8_weight_only"
594595
)
@@ -597,6 +598,33 @@ pipe.text_encoder = cache_dit.quantize(
597598
)
598599
```
599600

601+
For **4-bits W4A16 (weight only)** quantization, we recommend `nf4` from **bitsandbytes** due to its better compatibility for many devices. Users can directly use it via the `quantization_config` of diffusers. For example:
602+
603+
```python
604+
from diffusers import QwenImagePipeline
605+
from diffusers.quantizers import PipelineQuantizationConfig
606+
607+
pipe = QwenImagePipeline.from_pretrained(
608+
"Qwen/Qwen-Image",
609+
torch_dtype=torch.bfloat16,
610+
quantization_config=(
611+
PipelineQuantizationConfig(
612+
quant_backend="bitsandbytes_4bit",
613+
quant_kwargs={
614+
"load_in_4bit": True,
615+
"bnb_4bit_quant_type": "nf4",
616+
"bnb_4bit_compute_dtype": torch.bfloat16,
617+
},
618+
components_to_quantize=["text_encoder", "transformer"],
619+
)
620+
),
621+
).to("cuda")
622+
623+
# Then, apply cache acceleration using cache-dit
624+
cache_dit.enable_cache(pipe, cache_config=...)
625+
```
626+
627+
600628
## 🛠Metrics Command Line
601629

602630
<div id="metrics"></div>
@@ -661,7 +689,7 @@ Unified Cache API for almost Any Diffusion Transformers (with Transformer Blocks
661689
### 👏API: enable_cache
662690

663691
```python
664-
def enable_cache(...) -> Union[DiffusionPipeline, BlockAdapter]
692+
def enable_cache(...) -> Union[DiffusionPipeline, BlockAdapter, Transformer]
665693
```
666694

667695
### 🌟Function Description
@@ -688,7 +716,7 @@ This function seamlessly integrates with both standard diffusion pipelines and c
688716

689717
### 👇Parameter Description
690718

691-
- **pipe_or_adapter**(`DiffusionPipeline` or `BlockAdapter`, *required*):
719+
- **pipe_or_adapter**(`DiffusionPipeline`, `BlockAdapter` or `Transformer`, *required*):
692720
The standard Diffusion Pipeline or custom BlockAdapter (from cache-dit or user-defined).
693721
For example: `cache_dit.enable_cache(FluxPipeline(...))`.
694722
Please check https://github.com/vipshop/cache-dit/blob/main/docs/User_Guide.md for the usage of BlockAdapter.

0 commit comments

Comments
 (0)