Skip to content

XavierJiezou/Face-MoGLE

Repository files navigation

Face-MoGLE

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

arXiv Paper Project Page
HugginngFace Models HugginngFace Datasets Daily Papers HugginngFace Spaces

teaser

⚙️ Installation

conda create -n face-mogle python=3.11.11
conda activate face-mogle
pip install -r requirements.txt

🏋️ Pretrained Weights

Download Checkpoints

Before running the inference, test and gradio demo, please download the following files:

Directory Setup

After downloading, please place the files in the following structure:

Face-MoGLE
├── ...
├── checkpoints
│   ├── FLUX.1-dev
├── runs
│   ├── face-mogle
│   │   ├── pytorch_lora_weights.safetensors
│   │   ├── global_local_mask_moe.pt
│   │   ├── config.yaml

🖼️ Inference

  • Text-to-Face Generation
python inference.py --prompt "She is wearing lipstick. She is attractive and has straight hair."
  • Mask-to-Face Generation
python inference.py --mask "assets/readme_demo/27000.png"
  • (Text+Mask)-to-Face Generation
python inference.py \
    --prompt "She is wearing lipstick. She is attractive and has straight hair." \
    --mask "assets/readme_demo/27000.png"
Text Prompt Senmentic Mask Generated Face
“She is wearing lipstick. She is attractive and has straight hair.” Text2Face Output
Mask Mask2Face Output
“She is wearing lipstick. She is attractive and has straight hair.” Mask (Text+Mask)2Face Output

🌐 Gradio Demo (Web UI)

CUDA_VISIBLE_DEVICES=0 python gradio_app.py

Demo.mp4

📦 Prepare Data

Download Datasets

You can download the datasets from Hugging Face:

Dataset Name Download Link Usage
MM-CelebA-HQ Hugging Face
(Also available in TediGAN)
Training & Evaluation
MM-FairFace-HQ Hugging Face Just for Zero-shot Generalization Validation
MM-FFHQ-Female Hugging Face Just for Zero-shot Generalization Validation

Note:

The MM-FairFace-HQ and MM-FFHQ-Female datasets are multimodal extensions we constructed based on the original face image datasets, using a semi-automated annotation approach.

Dataset Structure

After extraction, please organize the directory as follows:

Face-MoGLE
├── ...
├── data
│   ├── mmcelebahq
│   │   ├── face
│   │   │   ├── 0.jpg
│   │   │   ├── 1.jpg
│   │   ├── mask
│   │   │   ├── 0.png
│   │   │   ├── 1.png
│   │   ├── text
│   │   │   ├── 0.txt
│   │   │   ├── 1.txt
│   │   ├── text.json
│   ├── mmffhqfemale
│   │   ├── face
│   │   │   ├── 00001.jpg
│   │   │   ├── 00002.jpg
│   │   ├── mask
│   │   │   ├── 00001.png
│   │   │   ├── 00002.png
│   │   ├── text
│   │   │   ├── 00001.txt
│   │   │   ├── 00002.txt
│   │   ├── text.json
│   ├── mmfairfacehq
│   │   ├── face
│   │   │   ├── 52.jpg
│   │   │   ├── 55.jpg
│   │   ├── mask
│   │   │   ├── 52.png
│   │   │   ├── 55.png
│   │   ├── text
│   │   │   ├── 52.txt
│   │   │   ├── 55.txt

🚀 Training

bash script/train_face-mogle.sh

🧪 Testing

python test.py \
  --root data/mmcelebahq \
  --lora_ckpt runs/face-mogle/pytorch_lora_weights.safetensors \
  --moe_ckpt runs/face-mogle/global_local_mask_moe.pt \
  --pretrained_ckpt checkpoints/FLUX.1-dev \
  --config_path runs/face-mogle/config.yaml \
  --output_dir visualization/face-mogle

📊 Evaluation

Face-MoGLE is evaluated across multiple dimensions, including:

  • Generation Quality: FID & KID & CMMD
  • Condition Alignment: Text Consistency & Mask Consistency
  • Human Preference: IR

FID / KID / Text Consistency

FID & KID: https://github.com/GaParmar/clean-fid

Text Consistency: https://github.com/Taited/clip-score

python src/eval/eval_fid_kid_text.py \
    --fake_image visulization/face-mogle/face \
    --real_face_dir visulization/mmcelebahq/face \
    --real_text_dir visulization/mmcelebahq/text

CMMD (CLIP Maximum Mean Discrepancy)

CMMD: https://github.com/sayakpaul/cmmd-pytorch

cd src/eval/eval_cmmd & python eval_cmmd.py <gt_dir> <pred_dir>

Mask Consistency (DINO Structure Distance)

Mask Consistency: https://github.com/omerbt/Splice

python src/eval/eval_mask.py \
    --real_dir visulization/mmcelebahq/face \
    --fake_img visulization/face-mogle/face

IR (ImageReward)

IR: https://github.com/THUDM/ImageReward

python src/eval/eval_ir.py \
    --image_path visulization/face-mogle/face \
    --text_path visulization/mmcelebahq/text

👀 Visualization

More visualization results are available at Hugging Face, which can be used for comparison in your paper. Please kindly cite our work if you find it useful.

Monomodal Generation

Mask-to-Face Generation Text-to-Face Generation

Multimodal Generation

Ablation Study

Zero-Shot Generalization

  • MM-FFHQ-Female

  • MM-FairFace-HQ

📚 Citation

@misc{face-mogle,
      title={Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation}, 
      author={Xuechao Zou and Shun Zhang and Xing Fu and Yue Li and Kai Li and Yushe Cao and Congyan Lang and Pin Tao and Junliang Xing},
      year={2025},
      eprint={2509.00428},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.00428}, 
}

📜 License

This project is licensed under the Apache License 2.0 License. See the LICENSE file for details.

About

Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published