This repository contains the official implementation for the paper "Towards Transferable Personality Representation Learning based on Triplet Comparisons and Its Applications" (EMNLP-Main 2025).
The code requires the following dependencies (see requirements.txt):
torch>=1.8.0
numpy
pandas
scikit-learn
transformers
datasets
tqdm
scipy
Download: The datasets are available at Google Drive: Download Datasets
Organization: The dataset contains three main folders:
utterances: Raw single-sentence corpora.triplet: Generated and filtered triplets used for encoder training.by-product: By-product datasets used for downstream verification (personality detection).
After downloading, please organize the data files into the data/ directory.
The training process consists of Pre-training (Contrastive Learning) and Downstream Personality Detection.
We use contrastive learning to fine-tune the BERT embeddings.
1. Warm-up Training: This step performs Masked Language Modeling (MLM) or similar warm-up tasks.
cd embedding
bash scripts/train_warm_ml.sh2. Contrastive Pre-training: Train the encoder using contrastive loss.
cd embedding
bash scripts/train.shThe trained model will be saved in embedding/output_embedding/ (or as configured in the script).
Train the classifier for personality traits (e.g., MBTI/Big5) using the pre-trained embeddings.
1. Train:
cd personality_detection
bash scripts/run.sh2. Test:
cd personality_detection
bash scripts/test.shIf you find this code useful, please cite our paper:
@inproceedings{tang2025towards,
title = {Towards Transferable Personality Representation Learning based on Triplet Comparisons and Its Applications},
author = {Kai Tang and Rui Wang and Renyu Zhu and Minmin Lin and Xiao Ding and Tangjie Lv and Changjie Fan and Runze Wu and Haobo Wang},
booktitle = {The Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year = {2025}
}