Clinical Word Embeddings

By Zachary Flamholz, Andrew Crane-Droesch, Lyle Ungar, Gary Weissman

Description

Pre-trained word embeddings using the text of published clinical case reports. See the pre-preprint for a detailed description of the methods used to build and test the word embeddings.

Download

Model	Dimension	Open Access Case Reports	Open Access All Manuscripts
word2vec	100	Download - 269 MB	Download - 2.7 GB
	300	Download - 716 MB	Download - 7.8 GB
	600	Download - 1.4 GB
fastText	100	Download - 798 MB	Download - 4.7 GB
	300	Download - 2.3 GB	Download - 13.8 GB
	600	Download - 4.6 GB
GloVe	100	Download - 157 MB	Download - 1.3 GB
	300	Download - 445 MB	Download - 3.8 GB
	600	Download - 862 MB	Download - 7.4 GB

Details

Word embeddings are compatible with the gensim Python package format.

Quick start

First download and extract the files from each archive.

tar -xvf w2v_100d_oa_all.tar.gz

Then load the embeddings into Python.

from gensim.models import FastText, Word2Vec, KeyedVectors # KeyedVectors are used to load the GloVe models

# Load the model
model = Word2Vec.load('w2v_oa_all_100d.bin')

# Return 100-dimensional vector representations of each word
model.wv.word_vec('diabetes')
model.wv.word_vec('cardiac_arrest')
model.wv.word_vec('lymphangioleiomyomatosis')

# Try out cosine similarity
model.wv.similarity('copd', 'chronic_obstructive_pulmonary_disease')
model.wv.similarity('myocardial_infarction', 'heart_attack')
model.wv.similarity('lymphangioleiomyomatosis', 'lam')

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
pubmed_open_access_processing_notebooks		pubmed_open_access_processing_notebooks
LICENSE		LICENSE
README.md		README.md
manu_clinical_embeddings_12.4.2019.pdf		manu_clinical_embeddings_12.4.2019.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clinical Word Embeddings

Description

Download

Details

Quick start

About

Releases 1

Packages

Contributors 3

Languages

License

gweissman/clinical_embeddings

Folders and files

Latest commit

History

Repository files navigation

Clinical Word Embeddings

Description

Download

Details

Quick start

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages