curate-mimic - A repository to guide in demonstrating how to curate the MIMIC III database with Natural Language Processing
-
Preliminary steps:
- Obtain access to the MIMIC III database (ask your PI).
- Obtain a UMLS account and API key
- Install Docker and docker-compose
- If writing to a MongoDB database:
- Create local directory:
mkdir mimic_db - docker run --rm --name mongodb -d -v mimic_db:/data/db -p 27017:27017 mongo
- Create local directory:
-
Setup cTAKES containers:
git clone git@github.com:Machine-Learning-for-Medical-Language/ctakes-rest-package.gitcd ctakes-rest-packageexport umls_api_key=<api key from above>docker-compose up -d --scale ctakes=N# This starts N containers -- each requires around 4 GB RAM.
-
Run the python script to process the data -- run with -h to receive detailed documentation of the options:
python process_mimic.py --input-path <path to NOTEEVENTS.csv file> --output-format <json|mongo|xmi|fhir> --output_dir <directory to write files if output format is file-based>- If you run with the flag
--max-notes N, you can run on a subset to make sure everything is working correctly before processing the whole dataset.
-
If you used MongoDB, check into some of the output:
$ mongo> use mimic> db.note_nlp.stats()['count']# Should return number of notes processed and entered into database