This repository contains code for the paper Bottom-Up and Top-Down: Predicting Personality with Psycholinguistic and Language Model Features, published in IEEE International Conference of Data Mining 2020.
Here are a set of experiments written in tensorflow + pytorch to explore automated personality detection using Language Models on the Essays dataset (Big-Five personality labelled traits) and the Kaggle MBTI dataset.
Pull the repository from GitHub, followed by creating a new virtual environment (conda or venv):
git clone https://github.com/yashsmehta/personality-prediction.git
cd personality-prediction
conda create -n mvenv python=3.10
Install poetry, and use that to install the dependencies required for running the project:
curl -sSL https://install.python-poetry.org | python3 -
poetry install
First run the LM extractor code which passes the dataset through the language model and stores the embeddings (of all layers) in a pickle file. Creating this 'new dataset' saves us a lot of compute time and allows effective searching of the hyperparameters for the finetuning network. Before running the code, create a pkl_data folder in the repo folder. All the arguments are optional and passing no arguments runs the extractor with the default values.
python LM_extractor.py -dataset_type 'essays' -token_length 512 -batch_size 32 -embed 'bert-base' -op_dir 'pkl_data'
Next run a finetuning model to take the extracted features as input from the pickle file and train a finetuning model. We find a shallow MLP to be the best performing one
python finetune_models/MLP_LM.py
Results Table | Language Models vs Psycholinguistic Traits |
---|---|
Follow the steps below for predicting personality (e.g. the Big-Five: OCEAN traits) on a new text/essay:
python finetune_models/MLP_LM.py -save_model 'yes'
Now use the script below to predict the unseen text:
python unseen_predictor.py
LM_extractor.py
On a RTX2080 GPU, the -embed 'bert-base' extractor takes about ~2m 30s and 'bert-large' takes about ~5m 30s
On a CPU, 'bert-base' extractor takes about ~25m
python finetune_models/MLP_LM.py
On a RTX2080 GPU, running for 15 epochs (with no cross-validation) takes from 5s-60s, depending on the MLP architecture.
Deep Learning based Personality Prediction [Literature REVIEW] (Springer AIR Journal - 2020)
@article{mehta2020recent,
title={Recent Trends in Deep Learning Based Personality Detection},
author={Mehta, Yash and Majumder, Navonil and Gelbukh, Alexander and Cambria, Erik},
journal={Artificial Intelligence Review},
pages={2313–2339},
year={2020},
doi = {https://doi.org/10.1007/s10462-019-09770-z},
url = {https://link.springer.com/article/10.1007/s10462-019-09770-z}
publisher={Springer}
}
Language Model Based Personality Prediction (ICDM - 2020)
If you find this repo useful for your research, please cite it using the following:
@inproceedings{mehta2020bottom,
title={Bottom-up and top-down: Predicting personality with psycholinguistic and language model features},
author={Mehta, Yash and Fatehi, Samin and Kazameini, Amirmohammad and Stachl, Clemens and Cambria, Erik and Eetemadi, Sauleh},
booktitle={2020 IEEE International Conference on Data Mining (ICDM)},
pages={1184--1189},
year={2020},
organization={IEEE}
}
The source code for this project is licensed under the MIT license.