conda create -n face-mogle python=3.11.11
conda activate face-mogle
pip install -r requirements.txt
Before running the inference, test and gradio demo, please download the following files:
-
Pretrain: FLUX.1-dev (DiT-based)
-
SFT: pytorch_lora_weights.safetensors (LoRA) & global_local_mask_moe.pt (MoGLE)
After downloading, please place the files in the following structure:
Face-MoGLE
├── ...
├── checkpoints
│ ├── FLUX.1-dev
├── runs
│ ├── face-mogle
│ │ ├── pytorch_lora_weights.safetensors
│ │ ├── global_local_mask_moe.pt
│ │ ├── config.yaml
- Text-to-Face Generation
python inference.py --prompt "She is wearing lipstick. She is attractive and has straight hair."
- Mask-to-Face Generation
python inference.py --mask "assets/readme_demo/27000.png"
- (Text+Mask)-to-Face Generation
python inference.py \
--prompt "She is wearing lipstick. She is attractive and has straight hair." \
--mask "assets/readme_demo/27000.png"
Text Prompt | Senmentic Mask | Generated Face |
---|---|---|
“She is wearing lipstick. She is attractive and has straight hair.” | ∅ | ![]() |
∅ | ![]() |
![]() |
“She is wearing lipstick. She is attractive and has straight hair.” | ![]() |
![]() |
CUDA_VISIBLE_DEVICES=0 python gradio_app.py
Demo.mp4
You can download the datasets from Hugging Face:
Dataset Name | Download Link | Usage |
---|---|---|
MM-CelebA-HQ | Hugging Face (Also available in TediGAN) |
Training & Evaluation |
MM-FairFace-HQ | Hugging Face | Just for Zero-shot Generalization Validation |
MM-FFHQ-Female | Hugging Face | Just for Zero-shot Generalization Validation |
Note:
The MM-FairFace-HQ and MM-FFHQ-Female datasets are multimodal extensions we constructed based on the original face image datasets, using a semi-automated annotation approach.
After extraction, please organize the directory as follows:
Face-MoGLE
├── ...
├── data
│ ├── mmcelebahq
│ │ ├── face
│ │ │ ├── 0.jpg
│ │ │ ├── 1.jpg
│ │ ├── mask
│ │ │ ├── 0.png
│ │ │ ├── 1.png
│ │ ├── text
│ │ │ ├── 0.txt
│ │ │ ├── 1.txt
│ │ ├── text.json
│ ├── mmffhqfemale
│ │ ├── face
│ │ │ ├── 00001.jpg
│ │ │ ├── 00002.jpg
│ │ ├── mask
│ │ │ ├── 00001.png
│ │ │ ├── 00002.png
│ │ ├── text
│ │ │ ├── 00001.txt
│ │ │ ├── 00002.txt
│ │ ├── text.json
│ ├── mmfairfacehq
│ │ ├── face
│ │ │ ├── 52.jpg
│ │ │ ├── 55.jpg
│ │ ├── mask
│ │ │ ├── 52.png
│ │ │ ├── 55.png
│ │ ├── text
│ │ │ ├── 52.txt
│ │ │ ├── 55.txt
bash script/train_face-mogle.sh
python test.py \
--root data/mmcelebahq \
--lora_ckpt runs/face-mogle/pytorch_lora_weights.safetensors \
--moe_ckpt runs/face-mogle/global_local_mask_moe.pt \
--pretrained_ckpt checkpoints/FLUX.1-dev \
--config_path runs/face-mogle/config.yaml \
--output_dir visualization/face-mogle
Face-MoGLE is evaluated across multiple dimensions, including:
- Generation Quality: FID & KID & CMMD
- Condition Alignment: Text Consistency & Mask Consistency
- Human Preference: IR
FID & KID: https://github.com/GaParmar/clean-fid
Text Consistency: https://github.com/Taited/clip-score
python src/eval/eval_fid_kid_text.py \
--fake_image visulization/face-mogle/face \
--real_face_dir visulization/mmcelebahq/face \
--real_text_dir visulization/mmcelebahq/text
cd src/eval/eval_cmmd & python eval_cmmd.py <gt_dir> <pred_dir>
Mask Consistency: https://github.com/omerbt/Splice
python src/eval/eval_mask.py \
--real_dir visulization/mmcelebahq/face \
--fake_img visulization/face-mogle/face
python src/eval/eval_ir.py \
--image_path visulization/face-mogle/face \
--text_path visulization/mmcelebahq/text
More visualization results are available at Hugging Face, which can be used for comparison in your paper. Please kindly cite our work if you find it useful.
Mask-to-Face Generation | Text-to-Face Generation |
- MM-FFHQ-Female
- MM-FairFace-HQ
@misc{face-mogle,
title={Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation},
author={Xuechao Zou and Shun Zhang and Xing Fu and Yue Li and Kai Li and Yushe Cao and Congyan Lang and Pin Tao and Junliang Xing},
year={2025},
eprint={2509.00428},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.00428},
}
This project is licensed under the Apache License 2.0 License. See the LICENSE file for details.