PTCD

This repository contains the official implementation for the paper "Towards Transferable Personality Representation Learning based on Triplet Comparisons and Its Applications" (EMNLP-Main 2025).

Environment Requirements

The code requires the following dependencies (see requirements.txt):

torch>=1.8.0
numpy
pandas
scikit-learn
transformers
datasets
tqdm
scipy

Datasets

Download: The datasets are available at Google Drive: Download Datasets

Organization: The dataset contains three main folders:

utterances: Raw single-sentence corpora.
triplet: Generated and filtered triplets used for encoder training.
by-product: By-product datasets used for downstream verification (personality detection).

After downloading, please organize the data files into the data/ directory.

Training Pipeline

The training process consists of Pre-training (Contrastive Learning) and Downstream Personality Detection.

Step 1: Pre-training (Embedding)

We use contrastive learning to fine-tune the BERT embeddings.

1. Warm-up Training: This step performs Masked Language Modeling (MLM) or similar warm-up tasks.

cd embedding
bash scripts/train_warm_ml.sh

2. Contrastive Pre-training: Train the encoder using contrastive loss.

cd embedding
bash scripts/train.sh

The trained model will be saved in embedding/output_embedding/ (or as configured in the script).

Step 2: Downstream Personality Detection

Train the classifier for personality traits (e.g., MBTI/Big5) using the pre-trained embeddings.

1. Train:

cd personality_detection
bash scripts/run.sh

2. Test:

cd personality_detection
bash scripts/test.sh

Citation

If you find this code useful, please cite our paper:

@inproceedings{tang2025towards,
  title = {Towards Transferable Personality Representation Learning based on Triplet Comparisons and Its Applications},
  author = {Kai Tang and Rui Wang and Renyu Zhu and Minmin Lin and Xiao Ding and Tangjie Lv and Changjie Fan and Runze Wu and Haobo Wang},
  booktitle = {The Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
embedding		embedding
personality_detection		personality_detection
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PTCD

Environment Requirements

Datasets

Training Pipeline

Step 1: Pre-training (Embedding)

Step 2: Downstream Personality Detection

Citation

About

Uh oh!

Releases

Packages

Languages

zjutangk/PTCD

Folders and files

Latest commit

History

Repository files navigation

PTCD

Environment Requirements

Datasets

Training Pipeline

Step 1: Pre-training (Embedding)

Step 2: Downstream Personality Detection

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages