feat&fix(diffusers): add QwenImage lora finetune, new models and pipes and fix bugs#1394
Conversation
Summary of ChangesHello @Dong1017, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Qwen-Image model's training capabilities by introducing a dedicated LoRA fine-tuning script. It addresses several underlying issues in LoRA weight conversion and rotary embedding application, ensuring smoother and more robust training workflows. Additionally, the integration of gradient checkpointing and DeepSpeed Zero3 support aims to improve memory efficiency and scalability for distributed training environments. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a LoRA fine-tuning script for the Qwen-Image model and includes several bug fixes to support this functionality. The new example script is comprehensive, but contains some hardcoded values that could be parameterized for better flexibility. The fixes in the LoRA loading utilities and the Qwen-Image transformer model are valuable, particularly the addition of gradient checkpointing support, which is crucial for training large models. Overall, this is a solid contribution that enhances the usability of the Qwen-Image model.
| encoder_hidden_states=encoder_hidden_states, | ||
| encoder_hidden_states_mask=encoder_hidden_states_mask, | ||
| timestep=timestep / 1000, | ||
| img_shapes=[(1, 32, 32)], |
There was a problem hiding this comment.
The img_shapes argument is hardcoded to [(1, 32, 32)]. This is likely correct for the default 512x512 input size, but it will be incorrect if the height or width arguments are changed. This should be calculated dynamically based on the input dimensions. The latent shape is typically image_dim // vae_scale_factor // patch_size, which is image_dim // 16 in this case.
| img_shapes=[(1, 32, 32)], | |
| img_shapes=[(1, self.args.height // 16, self.args.width // 16)], |
| train_indices = list(range(666)) | ||
| eval_indices = list(range(666, 833)) |
| eval_indices = list(range(666, 833)) | ||
|
|
||
| def process_function(examples): | ||
| image = Image.open(io.BytesIO(examples["image"]["bytes"])).convert("RGB").resize((512, 512)) |
There was a problem hiding this comment.
The image resize dimensions (512, 512) are hardcoded. It's better to use data_args.height and data_args.width to allow for easy configuration of the image size.
| image = Image.open(io.BytesIO(examples["image"]["bytes"])).convert("RGB").resize((512, 512)) | |
| image = Image.open(io.BytesIO(examples["image"]["bytes"])).convert("RGB").resize((data_args.width, data_args.height)) |
| height=height, | ||
| width=width, | ||
| dtype=encoder_hidden_states.dtype, | ||
| generator=np.random.Generator(np.random.PCG64(seed=42)), |
There was a problem hiding this comment.
Fzilan
left a comment
There was a problem hiding this comment.
could you please add a readme file of the lora finetune?
…into qwenimage_update
What does this PR do?
Adds
QwenImageEditPlusPipeline, targeting feat: Add QwenImageEditPlus to support future feature upgradesQwenImageControlNetModelandQwenImageMultiControlNetModeland pipelinesQwenImageControlNetPipelineandQwenImageControlNetInpaintPipeline, targeting Support ControlNet for Qwen-Image and Support ControlNet-Inpainting for Qwen-ImageQwenImageAutoBlocks,QwenImageEditAutoBlocks,QwenImageEditPlusAutoBlocks,QwenImageModularPipeline,QwenImageEditModularPipeline,QwenImageEditPlusModularPipeline, targeting [Modular] Qwen, [modular] add tests for qwen modular and [core] support QwenImage Edit Plus in modularFixes according to diffusers merged PRs
Usage
More details about the
controlnet_script.pyhave been attached below.- Single control image
- Multiple control images
- Control inpaint
More details about the
test_modular.pyhave been attached below.- QwenImageModularPipeline
Performances
(Inference experiments are tested on Ascend Atlas 800T A2 machines with MindSpore 2.7.1. Finetune experiments are tested on Ascend Atlas 800T A2 machines with MindSpore 2.7.0)
Limitations
with ms._no_grad().def compute_lossinmindone/transformers/trainer.pyhas not yet been implemented, so it is not possible to directly specify a loss function for automatic calculation. Instead, we manually define the computation in the script by constructing a classTrainStepForQwenImage.def merge_and_unloadinmindone/peft/tuners/lora/model.pyis used to save the fine-tuned weights. However, in practice, it does not automatically merge the zero3 split weights. To address this, we useops.AllGather()in the fine-tuning script to combine the split weights and save them as a complete set of weights.Before submitting
What's New. Here are thedocumentation guidelines
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@xxx