Skip to content

bigdata-inha/FaceGCD-Generalized-Face-Discovery-via-Dynamic-Prefix-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FaceGCD: Generalized Face Discovery via Dynamic Prefix Generation [BMVC2025]

Official PyTorch implementation of "FaceGCD: Generalized Face Discovery via Dynamic Prefix Generation".

πŸ“„ Paper: arXiv:2507.22353


πŸ“‹ Overview

Recognizing and differentiating among both familiar and unfamiliar faces is a critical capability for face recognition systems and a key step toward artificial general intelligence (AGI). This repository introduces Generalized Face Discovery (GFD), a novel open-world face recognition task that unifies traditional face identification with generalized category discovery (GCD).

Key Features

  • Novel Task: GFD requires recognizing both labeled and unlabeled known identities (IDs) while simultaneously discovering new, previously unseen IDs
  • Dynamic Prefix Generation: Instance-specific feature extractors using lightweight, layer-wise prefixes generated on-the-fly by a HyperNetwork
  • State-of-the-art Performance: Significantly outperforms existing GCD methods and ArcFace baseline on fine-grained face recognition tasks
  • High Cardinality Support: Handles hundreds or thousands of visually similar face IDs effectively

FaceGCD Architecture


🎯 Key Contributions

  1. Generalized Face Discovery (GFD): A new task formulation that bridges face identification and clustering in open-world scenarios
  2. Dynamic Prefix Mechanism: HyperNetwork-based prefix generators that create instance-specific feature extractors without massive model capacity
  3. Comprehensive Benchmarks: Six GFD benchmark datasets (YTF-500/1000/2000, CASIA-500/1000/2000)
  4. Strong Generalization: Competitive performance on generic GCD benchmarks (CIFAR-100, ImageNet-100, CUB, etc.)

πŸš€ Installation

Requirements

  • Python 3.8+
  • PyTorch 2.3.1+
  • CUDA 11.8+
  • APEX (for mixed precision training)

Setup

  1. Clone the repository:
git clone https://github.com/yourusername/FaceGCD.git
cd FaceGCD
  1. Create a conda environment:
conda create -n facegcd python=3.8
conda activate facegcd
  1. Install dependencies:
pip install -r requirements.txt
  1. Install APEX (optional but recommended for faster training):
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

πŸ“Š Dataset Preparation

GFD Benchmark Datasets

We provide six benchmark datasets for evaluating the Generalized Face Discovery task:

Dataset Known IDs Unknown IDs Train Samples Test Samples
YTF-500 250 250 48,089 11,779
YTF-1000 500 500 96,002 23,523
YTF-2000 1,000 1,000 190,248 46,615
CASIA-500 250 250 46,991 11,999
CASIA-1000 500 500 89,508 22,867
CASIA-2000 1,000 1,000 184,432 47,114

Download

Datasets and pretrained checkpoints will be available soon via Google Drive.

Once downloaded, organize the data as follows:

FaceGCD/
β”œβ”€β”€ youtube_faces_500/
β”‚   β”œβ”€β”€ train/
β”‚   └── test/
β”œβ”€β”€ youtube_faces_1000/
β”‚   β”œβ”€β”€ train/
β”‚   └── test/
β”œβ”€β”€ youtube_faces_2000/
β”‚   β”œβ”€β”€ train/
β”‚   └── test/
β”œβ”€β”€ casia_webface_500/
β”‚   β”œβ”€β”€ train/
β”‚   └── test/
β”œβ”€β”€ casia_webface_1000/
β”‚   β”œβ”€β”€ train/
β”‚   └── test/
└── casia_webface_2000/
    β”œβ”€β”€ train/
    └── test/

Pretrained Weights

Download the DINO pretrained weights (checkpoint.pth) and place them in the project root directory.


πŸ‹οΈ Training

Train on YouTube Faces 1000

bash shell/train_youtube1000.sh

Or run directly:

CUDA_VISIBLE_DEVICES=0,1 torchrun \
    --nproc_per_node=2 \
    --master_port=55411 \
    train.py \
    --pretrained \
    --return-embed \
    --save-images \
    --pin-mem \
    --layer-embed \
    --experiment gcd_youtubefaces_1000_part_fvit_norm_prefix10 \
    --amp \
    --prefix_tuning \
    --prefix_length 10 \
    --data-path youtube_faces_1000 \
    --dataset youtubefaces_1000 \
    --amp-impl apex \
    --pretrained_weights checkpoint.pth \
    --log-wandb \
    --patch_size 8 \
    --input-size 3 112 112

Key Training Arguments

  • --prefix_tuning: Enable dynamic prefix generation
  • --prefix_length: Number of prefix tokens (default: 10)
  • --dataset: Choose from youtubefaces_500, youtubefaces_1000, youtubefaces_2000, casia_500, casia_1000, casia_2000
  • --data-path: Path to dataset directory
  • --pretrained_weights: Path to DINO pretrained checkpoint
  • --log-wandb: Enable Weights & Biases logging
  • --amp: Enable mixed precision training

Training on Other Datasets

For CASIA-WebFace or different scales:

# CASIA-1000
python train.py \
    --dataset casia_1000 \
    --data-path casia_webface_1000 \
    --prefix_length 10 \
    --prefix_tuning \
    [other arguments...]

# YTF-500
python train.py \
    --dataset youtubefaces_500 \
    --data-path youtube_faces_500 \
    --prefix_length 10 \
    --prefix_tuning \
    [other arguments...]

πŸ” Feature Extraction

After training, extract features for evaluation:

bash shell/feature_extract_youtube1000.sh

Or run directly:

CUDA_VISIBLE_DEVICES=0,1 torchrun \
    --nproc_per_node=2 \
    --master_port=25411 \
    extract_features.py \
    --pretrained \
    --return-embed \
    --save-images \
    --pin-mem \
    --experiment gcd_youtubefaces_1000_part_fvit_pretrain_prefix10 \
    --amp \
    --prefix_tuning \
    --prefix_length 10 \
    --data-path youtube_faces_1000 \
    --dataset youtubefaces_1000 \
    --amp-impl apex \
    --save_dir results \
    --landmark_cnn \
    --pretrained_weights checkpoint.pth \
    --checkpoint_weights model_best.pth.tar \
    --patch_size 8 \
    --input-size 3 112 112

This will save extracted features to the results/ directory.


πŸ“ˆ Evaluation

Perform Semi-Supervised K-Means clustering on extracted features:

bash shell/semi_supervised_k_means_youtube1000.sh

Or run directly:

CUDA_VISIBLE_DEVICES=0,1 torchrun \
    --nproc_per_node=2 \
    --master_port=32411 \
    SSK.py \
    --experiment GCD_base_prefix_gen \
    --dataset youtubefaces_1000 \
    --K 1000 \
    --max_kmeans_iter 500 \
    --k_means_init 10 \
    --experiment_idx gcd_youtubefaces_1000_part_fvit_pretrain_prefix10 \
    --save_dir results \
    --data-path youtube_faces_1000

Evaluation Metrics

  • ACC (Clustering Accuracy): Overall clustering accuracy

πŸ“Š Results

FaceGCD Architecture


πŸ—οΈ Project Structure

FaceGCD/
β”œβ”€β”€ data_loader/              # Dataset loaders and augmentations
β”‚   β”œβ”€β”€ augmentations/        # Data augmentation strategies
β”‚   β”œβ”€β”€ youtube_faces_*.py    # YTF dataset loaders
β”‚   β”œβ”€β”€ casia_webface_*.py    # CASIA dataset loaders
β”‚   └── data_loaders.py       # Main data loading utilities
β”œβ”€β”€ model/                    # Model architectures
β”‚   β”œβ”€β”€ dino_vision_transformer.py  # DINO ViT backbone
β”‚   β”œβ”€β”€ prefix_generator.py   # HyperNetwork-based prefix generator
β”‚   β”œβ”€β”€ ViT_face.py           # Face-specific ViT components
β”‚   └── mobilenet.py          # Landmark CNN
β”œβ”€β”€ trainer/                  # Training utilities
β”‚   β”œβ”€β”€ trainer.py            # Main training loop
β”‚   └── faster_mix_k_means_pytorch.py  # Semi-supervised K-Means
β”œβ”€β”€ utils/                    # Utility functions
β”‚   β”œβ”€β”€ cluster_utils.py      # Clustering utilities
β”‚   β”œβ”€β”€ losses.py             # Loss functions
β”‚   └── dino_utils.py         # DINO-specific utilities
β”œβ”€β”€ shell/                    # Shell scripts for experiments
β”œβ”€β”€ train.py                  # Training script
β”œβ”€β”€ extract_features.py       # Feature extraction script
β”œβ”€β”€ SSK.py                    # Semi-supervised K-Means evaluation
└── requirements.txt          # Python dependencies

πŸ“ Citation

If you find this work useful for your research, please cite:

@article{oh2025facegcd,
  title={FaceGCD: Generalized Face Discovery via Dynamic Prefix Generation},
  author={Oh, Yunseok and Choi, Dong-Wan},
  journal={arXiv preprint arXiv:2507.22353},
  year={2025}
}

πŸ™ Acknowledgments

This work is built upon several excellent projects:

  • DINO - Self-supervised Vision Transformers
  • GCD - Generalized Category Discovery
  • ArcFace - Face Recognition baseline
  • timm - PyTorch Image Models
  • APEX - Mixed Precision Training

πŸ“§ Contact

For questions or issues, please:


πŸ“œ License

This project is released under the MIT License. See LICENSE file for details.


πŸ”„ Updates

  • [2025-10] Initial release of code and paper
  • [Coming Soon] Pretrained models and datasets will be available via Google Drive

Note: This repository is actively maintained. Dataset and checkpoint download links will be updated once the upload to Google Drive is complete.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published