Skip to content

In our implementation of Qwen-Image-Edit, we employ block causal attention to improve inference speed.

License

Notifications You must be signed in to change notification settings

ModelTC/Qwen-Image-Edit-Causal

Repository files navigation

Qwen-Image-Edit-Causal

We employ block causal attention to improve inference speed of Qwen-Image-Edit-2511.

🔥 Latest News

📑 Todo List

  • Qwen-Image-Edit-Causal
  • Qwen-Image-Edit-Interactive (multi-turn edit)

📑 Methodology

The figure below illustrates the core design of Qwen-Image-Edit-Causal: reference-image queries attend only to their own keys and values, which reduces training-time computation and decouples the attention of reference images from the number of inference steps.

Qwen-Image-Edit-Causal

📑 Performance Reports

Speed Test

The test enviroment: H100 GPU, SDPA attention backend. All tests are conducted with image size fixed at 1024 x 1024. Please refer to Run Speed Test to reproduce the results.

Method 1 Ref Img 2 Ref Img 3 Ref Img
Qwen-Image-Edit-2511 80 NFE 37.001 s 63.300 s 93.586 s
Qwen-Image-Edit-2511-Lightning 4 NFE 1.847 s 3.160 s 4.664 s
Qwen-Image-Edit-Causal 4 NFE 1.274 s 1.684 s 2.088 s

Quality Comparison

Input Image Prompt Case Qwen-Image-Edit-2511 40steps Qwen-Image-Edit-Causal 4steps
Make the girl from Image 1 wear the necklace from Image 2 and carry the bag from Image 3 on her left shoulder.
The monk in the Image 1 and The woman in the Image 2 are standing close, holding hands, suggesting a moment of connection or intimacy. They appear to be in a grand hall with ornate lighting and decorations, indicating a formal or celebratory setting. The shot size is medium, capturing both characters from the waist up, allowing for a clear view of their expressions and upper body gestures.
Change the character's hair color from blonde to white, and add a hard side light coming from the right side of the image, so the shadows on the left half of the face become more pronounced.
将图中的人物改为日漫风格,并给图片添加文字“使用Lightx2V Qwen-Image-Lightning 加速图像生成和图片编辑”。
Generate an image that matches the depth map, following this description: A dilapidated red bicycle is parked on a muddy path with a dense primeval forest in the background.
Make the girl from Image 1 wear the black dress from Image 2 and sit in the pose from Image 3.

🚀 Run Evaluation and Test with Diffusers

Installation

Install python environment with uv

git clone https://github.com/ModelTC/Qwen-Image-Edit-Causal.git
cd Qwen-Image-Edit-Causal
uv venv
uv sync
source .venv/bin/activate

Run Qwen-Image-Edit-Causal Model

python generate_with_diffusers.py \
--model_name lightx2v/Qwen-Image-Edit-Causal \
--prompt_list_file examples/prompt_list.txt \
--image_path_list_file examples/image_path_list.txt \
--out_dir results/Qwen-Image-Edit-Causal \
--base_seed 0 --steps 4 --cfg 1.0

Run Speed Test

# Qwen-Image-Edit-Causal
python test_inference_speed.py \
--model_name lightx2v/Qwen-Image-Edit-Causal \
--is_causal 1 --steps 4 --cfg 1

# Qwen-Image-Edit-2511-Lightning
python test_inference_speed.py \
--model_name Qwen/Qwen-Image-Edit-2511 \
--is_causal 0 --steps 4 --cfg 1

# Qwen-Image-Edit-2511
python test_inference_speed.py \
--model_name Qwen/Qwen-Image-Edit-2511 \
--is_causal 0 --steps 40 --cfg 4

License Agreement

The models in this repository are licensed under the Apache 2.0 License. We claim no rights over your generated contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the license.

Acknowledgements

We built upon and reused code from the following projects: Qwen-Image, Qwen-Image-Lightning, licensed under the Apache License 2.0.

The evaluation text prompts are from Qwen-Image, Qwen-Image Blog and Qwen-Image-Service.

The test cases for Image Editing are from Qwen-Image-Edit-api, reddit and Chat-Qwen-AI

About

In our implementation of Qwen-Image-Edit, we employ block causal attention to improve inference speed.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages