Neural Record Captioning (NRC)

This repository contains code from the paper Natural Language Generation for Electronic Health Records.

what's included

Keras code for the NRC model.
Training and testing scripts for the model.
Example scripts for preprocessing EHR data to be used in the model.

getting started

Install the necessary Python modules (list below)
Use preprocessing/sparisfy.py to convert the discrete variables in your EHRs to sparse format
Use preprocessing/words_to_integers.py to convert your free text field to integers
Train the autoencoder on the sparse records with ae_training.py
Train the NRC model with caption_training.py
Generate text with caption_testing.py

required software

Python 3.x
Keras with the TensorFlow backend
Pandas, NumPy, h5py, and scikit-learn

hot tips

The default hyperparameters worked well for the data used in our paper, but they might not for yours, so feel free to experiment! Also, we recommend a GPU for training the captioning model. We used a single NVIDIA Titan X for our experiments, and training with ~2 million records took around 6 hours.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Neural Record Captioning (NRC)

what's included

getting started

required software

hot tips

Files

README.md

Latest commit

History

README.md

File metadata and controls

Neural Record Captioning (NRC)

what's included

getting started

required software

hot tips