Face-MoGLE

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

⚙️ Installation

conda create -n face-mogle python=3.11.11
conda activate face-mogle
pip install -r requirements.txt

🏋️ Pretrained Weights

Download Checkpoints

Before running the inference, test and gradio demo, please download the following files:

Pretrain: FLUX.1-dev (DiT-based)
SFT: pytorch_lora_weights.safetensors (LoRA) & global_local_mask_moe.pt (MoGLE)

Directory Setup

After downloading, please place the files in the following structure:

Face-MoGLE
├── ...
├── checkpoints
│   ├── FLUX.1-dev
├── runs
│   ├── face-mogle
│   │   ├── pytorch_lora_weights.safetensors
│   │   ├── global_local_mask_moe.pt
│   │   ├── config.yaml

🖼️ Inference

Text-to-Face Generation

python inference.py --prompt "She is wearing lipstick. She is attractive and has straight hair."

Mask-to-Face Generation

python inference.py --mask "assets/readme_demo/27000.png"

(Text+Mask)-to-Face Generation

python inference.py \
    --prompt "She is wearing lipstick. She is attractive and has straight hair." \
    --mask "assets/readme_demo/27000.png"

Text Prompt	Senmentic Mask	Generated Face
“She is wearing lipstick. She is attractive and has straight hair.”	∅
∅
“She is wearing lipstick. She is attractive and has straight hair.”

🌐 Gradio Demo (Web UI)

CUDA_VISIBLE_DEVICES=0 python gradio_app.py

Demo.mp4

📦 Prepare Data

Download Datasets

You can download the datasets from Hugging Face:

Dataset Name	Download Link	Usage
MM-CelebA-HQ	Hugging Face (Also available in TediGAN)	Training & Evaluation
MM-FairFace-HQ	Hugging Face	Just for Zero-shot Generalization Validation
MM-FFHQ-Female	Hugging Face	Just for Zero-shot Generalization Validation

Note:

The MM-FairFace-HQ and MM-FFHQ-Female datasets are multimodal extensions we constructed based on the original face image datasets, using a semi-automated annotation approach.

Dataset Structure

After extraction, please organize the directory as follows:

Face-MoGLE
├── ...
├── data
│   ├── mmcelebahq
│   │   ├── face
│   │   │   ├── 0.jpg
│   │   │   ├── 1.jpg
│   │   ├── mask
│   │   │   ├── 0.png
│   │   │   ├── 1.png
│   │   ├── text
│   │   │   ├── 0.txt
│   │   │   ├── 1.txt
│   │   ├── text.json
│   ├── mmffhqfemale
│   │   ├── face
│   │   │   ├── 00001.jpg
│   │   │   ├── 00002.jpg
│   │   ├── mask
│   │   │   ├── 00001.png
│   │   │   ├── 00002.png
│   │   ├── text
│   │   │   ├── 00001.txt
│   │   │   ├── 00002.txt
│   │   ├── text.json
│   ├── mmfairfacehq
│   │   ├── face
│   │   │   ├── 52.jpg
│   │   │   ├── 55.jpg
│   │   ├── mask
│   │   │   ├── 52.png
│   │   │   ├── 55.png
│   │   ├── text
│   │   │   ├── 52.txt
│   │   │   ├── 55.txt

🚀 Training

bash script/train_face-mogle.sh

🧪 Testing

python test.py \
  --root data/mmcelebahq \
  --lora_ckpt runs/face-mogle/pytorch_lora_weights.safetensors \
  --moe_ckpt runs/face-mogle/global_local_mask_moe.pt \
  --pretrained_ckpt checkpoints/FLUX.1-dev \
  --config_path runs/face-mogle/config.yaml \
  --output_dir visualization/face-mogle

📊 Evaluation

Face-MoGLE is evaluated across multiple dimensions, including：

Generation Quality： FID & KID & CMMD
Condition Alignment: Text Consistency & Mask Consistency
Human Preference: IR

FID / KID / Text Consistency

FID & KID: https://github.com/GaParmar/clean-fid

Text Consistency: https://github.com/Taited/clip-score

python src/eval/eval_fid_kid_text.py \
    --fake_image visulization/face-mogle/face \
    --real_face_dir visulization/mmcelebahq/face \
    --real_text_dir visulization/mmcelebahq/text

CMMD (CLIP Maximum Mean Discrepancy)

CMMD: https://github.com/sayakpaul/cmmd-pytorch

cd src/eval/eval_cmmd & python eval_cmmd.py <gt_dir> <pred_dir>

Mask Consistency (DINO Structure Distance)

Mask Consistency: https://github.com/omerbt/Splice

python src/eval/eval_mask.py \
    --real_dir visulization/mmcelebahq/face \
    --fake_img visulization/face-mogle/face

IR (ImageReward)

IR: https://github.com/THUDM/ImageReward

python src/eval/eval_ir.py \
    --image_path visulization/face-mogle/face \
    --text_path visulization/mmcelebahq/text

👀 Visualization

More visualization results are available at Hugging Face, which can be used for comparison in your paper. Please kindly cite our work if you find it useful.

Monomodal Generation

Mask-to-Face Generation	Text-to-Face Generation

Multimodal Generation

Ablation Study

Zero-Shot Generalization

MM-FFHQ-Female

MM-FairFace-HQ

📚 Citation

@misc{face-mogle,
      title={Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation}, 
      author={Xuechao Zou and Shun Zhang and Xing Fu and Yue Li and Kai Li and Yushe Cao and Congyan Lang and Pin Tao and Junliang Xing},
      year={2025},
      eprint={2509.00428},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.00428}, 
}

📜 License

This project is licensed under the Apache License 2.0 License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Face-MoGLE

⚙️ Installation

🏋️ Pretrained Weights

Download Checkpoints

Directory Setup

🖼️ Inference

🌐 Gradio Demo (Web UI)

📦 Prepare Data

Download Datasets

Dataset Structure

🚀 Training

🧪 Testing

📊 Evaluation

FID / KID / Text Consistency

CMMD (CLIP Maximum Mean Discrepancy)

Mask Consistency (DINO Structure Distance)

IR (ImageReward)

👀 Visualization

Monomodal Generation

Multimodal Generation

Ablation Study

Zero-Shot Generalization

📚 Citation

📜 License

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
assets		assets
config		config
runs/face-mogle		runs/face-mogle
script		script
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gradio_app.py		gradio_app.py
inference.py		inference.py
requirements.txt		requirements.txt
test.py		test.py

License

XavierJiezou/Face-MoGLE

Folders and files

Latest commit

History

Repository files navigation

Face-MoGLE

⚙️ Installation

🏋️ Pretrained Weights

Download Checkpoints

Directory Setup

🖼️ Inference

🌐 Gradio Demo (Web UI)

📦 Prepare Data

Download Datasets

Dataset Structure

🚀 Training

🧪 Testing

📊 Evaluation

FID / KID / Text Consistency

CMMD (CLIP Maximum Mean Discrepancy)

Mask Consistency (DINO Structure Distance)

IR (ImageReward)

👀 Visualization

Monomodal Generation

Multimodal Generation

Ablation Study

Zero-Shot Generalization

📚 Citation

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages