This is the repository for the paper AI generates covertly racist decisions about people based on their dialect. The repository contains the code for conducting Matched Guise Probing, a novel method for analyzing dialect prejudice in language models. Furthermore, the repository contains a demo illustrating how to use the code as well as scripts and notebooks for replicating the experiments and analyses from the paper.
All requirements can be found in requirements.txt
. If you use conda
, create a new environment and install the required dependencies there:
conda create -n dialect-prejudice python=3.10
conda activate dialect-prejudice
git clone https://github.com/valentinhofmann/dialect-prejudice.git
cd dialect-prejudice
pip install -r requirements.txt
Similarly, if you use virtualenv
, create a new environment and install the required dependencies there:
python -m virtualenv -p python3.10 dialect-prejudice
source dialect-prejudice/bin/activate
git clone https://github.com/valentinhofmann/dialect-prejudice.git
cd dialect-prejudice
pip install -r requirements.txt
The setup should only take a few moments.
Matched Guise Probing requires three types of data: two sets of texts that differ by dialect (e.g., African American English and Standard American English), a set of tokens that we want to analyze (e.g., trait adjectives), and a set of prompts. Put the two sets of texts as a tab-separated text file into data/pairs
.
We have included an example file, which is also used in the demo. Put the set of tokens
as a text file into data/attributes
. data/attributes
contains several example files (e.g., the trait adjectives from the Princeton Trilogy used in the paper). Finally, define the set of prompts in probing/prompting.py
. probing/prompting.py
contains all prompts used in the paper.
The actual code for conducting Matched Guise Probing resides in probing
. Simply run the following command:
python3.10 mgp.py \
--model $model \
--variable $variable \
--attribute $attribute \
--device $device
The meaning of the individual arguments is as follow:
$model
is the name of the model being used (e.g.,t5-large
).$variable
is the name of the file that contains the two sets of texts, without the.txt
extension.$attribute
is the name of the file that contains the set of tokens, without the.txt
extension.$device
specifies the device on which to run the code.
For OpenAI models, you need to put your OpenAI API key into a file called .env
at the root of the repository (e.g., OPENAI_KEY=123456789
). We also use separate Python files to conduct Matched Guise Probing with OpenAI models. For example, you can run the following command for GPT4:
python3.10 mgp_gpt4.py \
--model $model \
--variable $variable \
--attribute $attribute
To run experiments that ask the models to make a discrete decision for each input text (e.g., the conviction experiment in the paper), you can use the same syntax as for general Matched Guise Probing. Simply put the decision tokens as a text file into data/attributes
and specify a set of suitable prompts in probing/prompting.py
. Since the models might assign different prior probabilities to the decision tokens, we recommend to use calibration based on the token probabilities in a neutral context. To do so, you can use the --calibrate
argument.
All prediction probabilities are stored in probing/probs
. We have included examples in notebooks
that show how to load and analyze these prediction probabilities. Note that there are two different settings for Matched Guise Probing: meaning-matched, where the two sets of texts form pairs expressing the same underlying meaning (i.e., the two tab-separated texts on each line in the text file belong together), and non-meaning-matched, where the two sets of texts are independent from each other. The file notebooks/helpers.py
contains two functions for loading predictions in these two settings (i.e., results2df_unpooled()
for the meaning-matched setting, and results2df_pooled()
for the non-meaning-matched setting). Alternatively, you can also add the name of the text file to the lists UNPOOLED_VARIABLES
or POOLED_VARIABLES
in notebooks/helpers.py
and use the function results2df()
.
We have created a demo that provides a worked-through example for using the code in this repository. Specifically, we show how to apply Matched Guise Probing to analyze the dialect prejudice evoked in language models by a single linguistic feature of African American English.
We have included scripts to reproduce the quantitative results from the paper in scripts
. The scripts expect the data from Blodgett et al. (2016) and Groenwold et al. (2020) as tab-separated text files in data/pairs
(see above). To replicate all experiments, run:
bash scripts/run_stereotype_experiment.sh $device
bash scripts/run_feature_experiment.sh $device
bash scripts/run_employability_experiment.sh $device
bash scripts/run_criminality_experiment.sh $device
bash scripts/run_scaling_experiment.sh $device
bash scripts/run_human_feedback_experiment.sh $device
Furthermore, we have included notebooks containing the analyses from the paper including the creation of plots and the conduction of statistical tests in notebooks
.
If you make use of the code in this repository, please cite the following paper:
@article{hofmann2024dialect,
title={AI generates covertly racist decisions about people based on their dialect},
author={Valentin Hofmann and Pratyusha Ria Kalluri and Dan Jurafsky and Sharese King},
journal={Nature},
volume={633},
pages={147--154},
url={https://www.nature.com/articles/s41586-024-07856-5},
year={2024}
}