Here we recreate plots from "Can AI Help Reduce Disparities in General Medical and Mental Health Care?" by Chen, Szolovits, and Ghassemi 2019 (AMA Journal of Ethics)
Because of data proprietary, we cannot share the psychiatric dataset. The same code is used for both datasets.
We demonstrate:
- Data hetereogeneity in the MIMIC clinical notes through LDA topic modeling and disparities in topics by race, gender, and insurance type
- Disparities in predictive accuracy by race, gender, and insurance type
-
Get MIMIC notes from
make_mimic_notes.py
. You will need to adjust the username and location of MIMIC data. -
Get Mallet topics from the notes. We convert the notes into separate text files in
make_mallet_data.py
. We then run Mallet inrun_mallet_topics.sh
. -
Create plots in
Recreate_Plots.ipynb
- MIMIC data access
- Mallet for topic modeling
- Python packages listed in
requirements.txt