Dialect Prejudice in Language Models

Overview

This is the repository for the paper AI generates covertly racist decisions about people based on their dialect. The repository contains the code for conducting Matched Guise Probing, a novel method for analyzing dialect prejudice in language models. Furthermore, the repository contains a demo illustrating how to use the code as well as scripts and notebooks for replicating the experiments and analyses from the paper.

Setup

All requirements can be found in requirements.txt. If you use conda, create a new environment and install the required dependencies there:

conda create -n dialect-prejudice python=3.10
conda activate dialect-prejudice
git clone https://github.com/valentinhofmann/dialect-prejudice.git
cd dialect-prejudice
pip install -r requirements.txt

Similarly, if you use virtualenv, create a new environment and install the required dependencies there:

python -m virtualenv -p python3.10 dialect-prejudice
source dialect-prejudice/bin/activate
git clone https://github.com/valentinhofmann/dialect-prejudice.git
cd dialect-prejudice
pip install -r requirements.txt

The setup should only take a few moments.

Usage

Matched Guise Probing requires three types of data: two sets of texts that differ by dialect (e.g., African American English and Standard American English), a set of tokens that we want to analyze (e.g., trait adjectives), and a set of prompts. Put the two sets of texts as a tab-separated text file into data/pairs. We have included an example file, which is also used in the demo. Put the set of tokens as a text file into data/attributes. data/attributes contains several example files (e.g., the trait adjectives from the Princeton Trilogy used in the paper). Finally, define the set of prompts in probing/prompting.py. probing/prompting.py contains all prompts used in the paper.

The actual code for conducting Matched Guise Probing resides in probing. Simply run the following command:

python3.10 mgp.py \
--model $model \
--variable $variable \
--attribute $attribute \
--device $device

The meaning of the individual arguments is as follow:

$model is the name of the model being used (e.g., t5-large).
$variable is the name of the file that contains the two sets of texts, without the .txt extension.
$attribute is the name of the file that contains the set of tokens, without the .txt extension.
$device specifies the device on which to run the code.

For OpenAI models, you need to put your OpenAI API key into a file called .env at the root of the repository (e.g., OPENAI_KEY=123456789). We also use separate Python files to conduct Matched Guise Probing with OpenAI models. For example, you can run the following command for GPT4:

python3.10 mgp_gpt4.py \
--model $model \
--variable $variable \
--attribute $attribute

To run experiments that ask the models to make a discrete decision for each input text (e.g., the conviction experiment in the paper), you can use the same syntax as for general Matched Guise Probing. Simply put the decision tokens as a text file into data/attributes and specify a set of suitable prompts in probing/prompting.py. Since the models might assign different prior probabilities to the decision tokens, we recommend to use calibration based on the token probabilities in a neutral context. To do so, you can use the --calibrate argument.

All prediction probabilities are stored in probing/probs. We have included examples in notebooks that show how to load and analyze these prediction probabilities. Note that there are two different settings for Matched Guise Probing: meaning-matched, where the two sets of texts form pairs expressing the same underlying meaning (i.e., the two tab-separated texts on each line in the text file belong together), and non-meaning-matched, where the two sets of texts are independent from each other. The file notebooks/helpers.py contains two functions for loading predictions in these two settings (i.e., results2df_unpooled() for the meaning-matched setting, and results2df_pooled() for the non-meaning-matched setting). Alternatively, you can also add the name of the text file to the lists UNPOOLED_VARIABLES or POOLED_VARIABLES in notebooks/helpers.py and use the function results2df().

Demo

We have created a demo that provides a worked-through example for using the code in this repository. Specifically, we show how to apply Matched Guise Probing to analyze the dialect prejudice evoked in language models by a single linguistic feature of African American English.

Reproduction

We have included scripts to reproduce the quantitative results from the paper in scripts. The scripts expect the data from Blodgett et al. (2016) and Groenwold et al. (2020) as tab-separated text files in data/pairs (see above). To replicate all experiments, run:

bash scripts/run_stereotype_experiment.sh $device
bash scripts/run_feature_experiment.sh $device
bash scripts/run_employability_experiment.sh $device
bash scripts/run_criminality_experiment.sh $device
bash scripts/run_scaling_experiment.sh $device
bash scripts/run_human_feedback_experiment.sh $device

Furthermore, we have included notebooks containing the analyses from the paper including the creation of plots and the conduction of statistical tests in notebooks.

Citation

If you make use of the code in this repository, please cite the following paper:

@article{hofmann2024dialect,
  title={AI generates covertly racist decisions about people based on their dialect}, 
  author={Valentin Hofmann and Pratyusha Ria Kalluri and Dan Jurafsky and Sharese King},
  journal={Nature},
  volume={633},
  pages={147--154},
  url={https://www.nature.com/articles/s41586-024-07856-5},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
data		data
demo		demo
notebooks		notebooks
perplexity		perplexity
probing		probing
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dialect Prejudice in Language Models

Overview

Setup

Usage

Demo

Reproduction

Citation

About

Releases

Packages

Languages

License

valentinhofmann/dialect-prejudice

Folders and files

Latest commit

History

Repository files navigation

Dialect Prejudice in Language Models

Overview

Setup

Usage

Demo

Reproduction

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages