Neural Record Captioning (NRC)

This repository contains code from the paper Natural Language Generation for Electronic Health Records.

what's included

Keras code for the NRC model.
Training and testing scripts for the model.
Example scripts for preprocessing EHR data to be used in the model.

getting started

Install the necessary Python modules (list below)
Use preprocessing/sparisfy.py to convert the discrete variables in your EHRs to sparse format
Use preprocessing/words_to_integers.py to convert your free text field to integers
Train the autoencoder on the sparse records with ae_training.py
Train the NRC model with caption_training.py
Generate text with caption_testing.py

required software

Python 3.x
Keras with the TensorFlow backend
Pandas, NumPy, h5py, and scikit-learn

hot tips

The default hyperparameters worked well for the data used in our paper, but they might not for yours, so feel free to experiment! Also, we recommend a GPU for training the captioning model. We used a single NVIDIA Titan X for our experiments, and training with ~2 million records took around 6 hours.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
metrics		metrics
models		models
preprocessing		preprocessing
tools		tools
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
ae_training.py		ae_training.py
caption_testing.py		caption_testing.py
caption_training.py		caption_training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Record Captioning (NRC)

what's included

getting started

required software

hot tips

About

Releases

Packages

Languages

License

scotthlee/NRC

Folders and files

Latest commit

History

Repository files navigation

Neural Record Captioning (NRC)

what's included

getting started

required software

hot tips

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages