Skip to content

Flame-Chasers/DiaNA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔥【CVPR 2025】Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment

This repository offers the official implementation of DiaNA in PyTorch.

In the meantime, check out our related papers if you are interested:

  • 【AAAI 2024】 An Empirical Study of CLIP for Text-based Person Search [paper | code]
  • 【ACM MM 2023】 Text-based Person Search without Parallel Image-Text Data [paper]
  • 【IJCAI 2023】 RaSa: Relation and Sensitivity Aware Representation Learning for Text-based Person Search [paper | code]
  • 【ICASSP 2022】 Learning Semantic-Aligned Feature Representation for Text-based Person Search [paper | code]

📖 Overview

DiaNA is a novel dialogue-refined cross-modal framework for chat-based person retrieval that leverages two adaptive attribute refiner modules to bottleneck the conversational and visual information for fine-grained cross-modal alignment.

DiaNA Architecture

📌 TODO

  • ✅ Release code
  • ✅ Release checkpoints
  • ✅ Release dataset

🗂️ Data Preparation

🔹 Pretraining Dataset

  • MALS, a large-scale synthetic TPR dataset with 1.5M image-text pairs

🔹 Fine-tuning Dataset: ChatPedes

  1. Download images from CUHK-PEDES
  2. Download ChatPedes annotation files from here
  3. Organize the dataset as follows:
<ROOT>/ChatPedes
    - train_reid.json
    - test_reid.json
    - imgs00
        - cam_a
        - cam_b
        - ...

🏋️‍♂️ Training


🔹 Stage 1: Pretraining on MALS

Run Pretraining:

cd DiaNA/train
bash shell/pretrain.sh

Resources:


🔹 Stage 2: Fine-tuning on ChatPedes

Run Fine-tuning:

cd DiaNA/train
bash shell/finetune.sh

Resources:

🎯 Evaluation

Run Evaluation:

cd DiaNA/eval
bash shell/eval.sh

Resources:

✨ Citation

If you find this paper useful, please consider staring 🌟 this repo and citing 📑 our paper:

@InProceedings{bai2025chat,
    author    = {Bai, Yang and Ji, Yucheng and Cao, Min and Wang, Jinqiao and Ye, Mang},
    title     = {Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages     = {3952--3962},
    month     = {June},
    year      = {2025}
}

⚖️ License

This code is distributed under an MIT LICENSE.

About

【CVPR 2025】Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment

Topics

Resources

Stars

Watchers

Forks

Contributors