Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 17 additions & 4 deletions diffsynth_engine/pipelines/qwen_image.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,19 +41,32 @@ def _from_diffsynth(self, lora_state_dict: Dict[str, torch.Tensor]) -> Dict[str,
dit_dict = {}
for key, param in lora_state_dict.items():
origin_key = key
if "lora_A.default.weight" not in key:
lora_a_suffix = None
if "lora_A.default.weight" in key:
lora_a_suffix = "lora_A.default.weight"
elif "lora_A.weight" in key:
lora_a_suffix = "lora_A.weight"

if lora_a_suffix is None:
continue

lora_args = {}
lora_args["down"] = param
lora_args["up"] = lora_state_dict[origin_key.replace("lora_A.default.weight", "lora_B.default.weight")]

lora_b_suffix = lora_a_suffix.replace("lora_A", "lora_B")
lora_args["up"] = lora_state_dict[origin_key.replace(lora_a_suffix, lora_b_suffix)]

lora_args["rank"] = lora_args["up"].shape[1]
alpha_key = origin_key.replace("lora_A.default.weight", "alpha").replace("lora_up.default.weight", "alpha")
alpha_key = origin_key.replace("lora_up", "lora_A").replace(lora_a_suffix, "alpha")

if alpha_key in lora_state_dict:
alpha = lora_state_dict[alpha_key]
else:
alpha = lora_args["rank"]
lora_args["alpha"] = alpha
key = key.replace(".lora_A.default.weight", "")

key = key.replace(f".{lora_a_suffix}", "")

if key.startswith("transformer") and "attn.to_out.0" in key:
key = key.replace("attn.to_out.0", "attn.to_out")
dit_dict[key] = lora_args
Expand Down
3 changes: 1 addition & 2 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,6 @@ We will continuously update DiffSynth-Engine to support more models. (Wan2.2 LoR

After the model is downloaded, load the model with the corresponding pipeline and perform inference.


### Image Generation(Qwen-Image)

The following code calls `QwenImagePipeline` to load the [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image) model and generate an image. Recommended resolutions are 928×1664, 1104×1472, 1328×1328, 1472×1104, and 1664×928, with a suggested cfg_scale of 4. If no negative_prompt is provided, it defaults to a single space character (not an empty string). For multi-GPU parallelism, currently only cfg parallelism is supported (parallelism=2), with other optimization efforts underway.
Expand Down Expand Up @@ -122,7 +121,7 @@ image.save("image.png")

Please note that if some necessary modules, like text encoders, are missing from a model repository, the pipeline will automatically download the required files.

#### Detailed Parameters(Qwen-Image)
### Detailed Parameters(Qwen-Image)

In the image generation pipeline `pipe`, we can use the following parameters for fine-grained control:

Expand Down
17 changes: 9 additions & 8 deletions docs/tutorial_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

## 安装

在使用 DiffSynth-Engine 前,请先确保您的硬件设备满足以下要求
在使用 DiffSynth-Engine 前,请先确保您的硬件设备满足以下要求:

* NVIDIA GPU CUDA 计算能力 8.6+(例如 RTX 50 Series、RTX 40 Series、RTX 30 Series 等,详见 [NVidia 文档](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities))或 Apple Silicon M 系列芯片

以及 Python 环境需求Python 3.10+。
以及 Python 环境需求: Python 3.10+。

使用 `pip3` 工具从 PyPI 安装 DiffSynth-Engine
使用 `pip3` 工具从 PyPI 安装 DiffSynth-Engine:

```shell
pip3 install diffsynth-engine
Expand Down Expand Up @@ -64,7 +64,7 @@ model_path = fetch_model("Wan-AI/Wan2.1-T2V-14B", path="diffusion_pytorch_model*

## 模型类型

Diffusion 模型包含多种多样的模型结构,每种模型由对应的流水线进行加载和推理,目前我们支持的模型类型包括
Diffusion 模型包含多种多样的模型结构,每种模型由对应的流水线进行加载和推理,目前我们支持的模型类型包括:

| 模型结构 | 样例 | 流水线 |
| --------------- | ------------------------------------------------------------ | ------------------- |
Expand Down Expand Up @@ -123,16 +123,17 @@ image.save("image.png")

#### 详细参数(Qwen-Image)

在图像生成流水线 `pipe` 中,我们可以通过以下参数进行精细的控制
在图像生成流水线 `pipe` 中,我们可以通过以下参数进行精细的控制:

* `prompt`: 提示词,用于描述生成图像的内容,支持多种语言(中文/英文/日文等),例如“一只猫”/"a cat"/"庭を走る猫"。
* `negative_prompt`: 负面提示词,用于描述不希望图像中出现的内容,例如“ugly”,默认为一个空格而不是空字符串, " "。
* `cfg_scale`: [Classifier-free guidance](https://arxiv.org/abs/2207.12598) 的引导系数,通常更大的引导系数可以达到更强的文图相关性,但会降低生成内容的多样性,推荐值为4。
* `cfg_scale`:[Classifier-free guidance](https://arxiv.org/abs/2207.12598) 的引导系数,通常更大的引导系数可以达到更强的文图相关性,但会降低生成内容的多样性,推荐值为4。
* `height`: 图像高度。
* `width`: 图像宽度。
* `num_inference_steps`: 推理步数,通常推理步数越多,计算时间越长,图像质量越高。
* `seed`: 随机种子,固定的随机种子可以使生成的内容固定。


### 图像生成

以下代码可以调用 `FluxImagePipeline` 加载[麦橘超然](https://www.modelscope.cn/models/MAILAND/majicflus_v1/summary?version=v1.0)模型生成一张图。如果要加载其他结构的模型,请将代码中的 `FluxImagePipeline` 和 `FluxPipelineConfig` 替换成对应的流水线模块及配置。
Expand All @@ -152,7 +153,7 @@ image.save("image.png")

#### 详细参数

在图像生成流水线 `pipe` 中,我们可以通过以下参数进行精细的控制
在图像生成流水线 `pipe` 中,我们可以通过以下参数进行精细的控制:

* `prompt`: 提示词,用于描述生成图像的内容,例如“a cat”。
* `negative_prompt`: 负面提示词,用于描述不希望图像中出现的内容,例如“ugly”。
Expand Down Expand Up @@ -217,7 +218,7 @@ save_video(video, "video.mp4")

#### 详细参数

在视频生成流水线 `pipe` 中,我们可以通过以下参数进行精细的控制
在视频生成流水线 `pipe` 中,我们可以通过以下参数进行精细的控制:

* `prompt`: 提示词,用于描述生成图像的内容,例如“a cat”。
* `negative_prompt`: 负面提示词,用于描述不希望图像中出现的内容,例如“ugly”。
Expand Down