Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
Downstreams		Downstreams
assets		assets
checkpoints		checkpoints
data		data
LICENSE		LICENSE
PLIPmodel.py		PLIPmodel.py
README.md		README.md
dataset_split.py		dataset_split.py
test_dataloader.py		test_dataloader.py
textual_model.py		textual_model.py
utils.py		utils.py
visual_model.py		visual_model.py
zs_infer.py		zs_infer.py

Repository files navigation

PLIP

PLIP is a novel Language-Image Pre-training framework for generic Person representation learning which benefits a range of downstream person-centric tasks.

Also, we present a large-scale person dataset named SYNTH-PEDES to verify its effectiveness, where the Stylish Pedestrian Attributes-union Captioning method (SPAC) is proposed to synthesize diverse textual descriptions.

Experiments show that our model not only significantly improves existing methods on downstream tasks, but also shows great ability in the few-shot and domain generalization settings. More details can be found at our paper PLIP: Language-Image Pre-training for Person Representation Learning.

News

🔥[05.31] The pre-trained model and zero-shot inference code are released !

SYNTH-PEDES

SYNTH-PEDES is by far the largest person dataset with textual descriptions without any human annotation effort. Every person image has 2 or 3 different descriptions. The dataset will be released soon.

These are some examples of our SYNTH-PEDES dataset.

This is the comparison of SYNTH-PEDES with other popular datasets.

Models

We utilize ResNet50 and Bert as our encoders. After pre-training, we fine-tune and evaluate the performance on three downstream tasks. The checkpoints have been released at Baidu Yun.

CUHK-PEDES dataset (Text Re-ID R@1/R@10)

Pre-train	CMPM/C	SSAN	LGUR
IN sup	54.81/83.22	61.37/86.73	64.21/87.93
IN unsup	55.34/83.76	61.97/86.63	65.33/88.47
CLIP	55.67/83.82	62.09/86.89	64.70/88.76
LUP	57.21/84.68	63.91/88.36	65.42/89.36
LUP-NL	57.35/84.77	63.71/87.46	64.68/88.69
PLIP(ours)	69.23/91.16	64.91/88.39	67.22/89.49

ICFG-PEDES dataset (Text Re-ID R@1/R@10)

Pre-train	CMPM/C	SSAN	LGUR
IN sup	47.61/75.48	54.23/79.53	57.42/81.45
IN unsup	48.34/75.66	55.27/79.64	59.90/82.94
CLIP	48.12/75.51	53.58/78.96	58.35/82.02
LUP	50.12/76.23	56.51/80.41	60.33/83.06
LUP-NL	49.64/76.15	55.59/79.78	60.25/82.84
PLIP(ours)	64.25/86.32	60.12/82.84	62.27/83.96

Market1501 & DukeMTMC (Image Re-ID mAP/cmc1)

Methods	Market1501	DukeMTMC
BOT	85.9/94.5	76.4/86.4
BDB	86.7/95.3	76.0/89.0
MGN	87.5/95.1	79.4/89.0
ABDNet	88.3/95.6	78.6/89.0
PLIP+BOT	88.0/95.1	77.0/86.5
PLIP+BDB	88.4/95.7	78.2/89.8
PLIP+MGN	90.6/96.3	81.7/90.3
PLIP+ABDNet	91.2/96.7	81.6/90.9

Evaluate on PETA & PA-100K & RAP (PAR mA/F1)

Methods	PETA	PA-100K	RAP
DeepMAR	80.14/83.56	78.28/84.32	76.81/78.94
Rethink	83.96/86.35	80.21/87.40	79.27/79.95
VTB	84.12/86.63	81.02/87.31	81.43/80.63
Label2Label	84.08/86.57	82.24/87.08	81.82/80.93
PLIP+DeepMAR	82.46/85.87	80.33/87.24	78.96/80.12
PLIP+Rethink	85.56/87.63	82.09/88.12	81.87/81.53
PLIP+VTB	86.03/88.14	83.24/88.57	83.64/81.78
PLIP+Label2Label	86.12/88.08	84.36/88.63	83.77/81.49

Usage

Install Requirements

we use 4 RTX3090 24G GPU for training and evaluation.

Create conda environment.

conda create --name PLIP --file requirements.txt
conda activate PLIP

Datasets Prepare

Download the CUHK-PEDES dataset from here and ICFG-PEDES dataset from here.

Organize them in data folder as follows:

|-- data/
|   |-- <CUHK-PEDES>/
|       |-- imgs
|            |-- cam_a
|            |-- cam_b
|            |-- ...
|       |-- reid_raw.json
|
|   |-- <ICFG-PEDES>/
|       |-- imgs
|            |-- test
|            |-- train 
|       |-- ICFG_PEDES.json
|
|   |-- <SYNTH-PEDES>/
|       |-- Part1
|       |-- ...
|       |-- Part11
|       |-- synthpedes_dataset.json

Zero-shot Inference

Our pre-trained model can directly be transfered to downstream tasks, especially text-based Re-ID.

Run the python file and generate train/test/valid json files respectively.

python dataset_split.py

Then you can evaluate by running:

python zs_inference.py

Fine-tuning Inference

Almost all existing downstream person-centric methods can be improved through replacing the backbone with our pre-trained model. Taking CMPM/C as example, the fine-tuning code will be released soon.

Evaluate on Other Tasks

The evaluation code will be released soon.

Reference

If you use PLIP in your research, please cite it by the following BibTeX entry:

@misc{zuo2023plip,
    title={PLIP: Language-Image Pre-training for Person Representation Learning},
    author={Jialong Zuo and Changqian Yu and Nong Sang and Changxin Gao},
    year={2023},
    eprint={2305.08386},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PLIP

News

SYNTH-PEDES

Models

CUHK-PEDES dataset (Text Re-ID R@1/R@10)

ICFG-PEDES dataset (Text Re-ID R@1/R@10)

Market1501 & DukeMTMC (Image Re-ID mAP/cmc1)

Evaluate on PETA & PA-100K & RAP (PAR mA/F1)

Usage

Install Requirements

Datasets Prepare

Zero-shot Inference

Fine-tuning Inference

Evaluate on Other Tasks

Reference

About

Releases

Packages

Languages

License

Zplusdragon/PLIP

Folders and files

Latest commit

History

Repository files navigation

PLIP

News

SYNTH-PEDES

Models

CUHK-PEDES dataset (Text Re-ID R@1/R@10)

ICFG-PEDES dataset (Text Re-ID R@1/R@10)

Market1501 & DukeMTMC (Image Re-ID mAP/cmc1)

Evaluate on PETA & PA-100K & RAP (PAR mA/F1)

Usage

Install Requirements

Datasets Prepare

Zero-shot Inference

Fine-tuning Inference

Evaluate on Other Tasks

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages