Skip to content

deepBioWSD is a single Bidirectional Long Short-Term Memory (BLSTM) network for deep word sense disambiguation (WSD) of biomedical text data collectively.

License

Notifications You must be signed in to change notification settings

iwera-git/deepBioWSD

Repository files navigation

Language Stars Repo Size


deepBioWSD: Effective Deep Neural Word Sense Disambiguation of Biomedical Text Data


This repo provides implementation of our paper tilted "deepBioWSD: effective deep neural word sense disambiguation of biomedical text data" published by JAMIA.

Background

With the recent advances in biomedicine, we have a wealth of information hidden in unstructured narratives such as research articles and clinical documents. A high accuracy Word Sense Disambiguation (WSD) algorithm can avoid a myriad of downstream difficulties in the natural language processing (NLP) applications pipeline when we try to mine and exploit this data properly. This is mainly due to the fact that word sense ambiguity is a pervasive characteristic of a natural language; for example, the word cold has several senses and may refer to a disease, a temperature sensation, or an environmental condition. The specific sense intended is determined by the textual context in which an instance of the ambiguous word appears. In "I am taking aspirin for my cold" the disease sense is intended, in "Let's go inside, I'm cold" the temperature sensation sense is meant, while "It's cold today, only 2 degrees", implies the environmental condition sense. Therefore, automatically identifying the intended sense of ambiguous words improves the proper inference of biomedical text data for clinical and biomedical applications.

deepBioWSD Network

This project addresses the substantial problem of WSD in NLP by introducing and developing a novel deep Bidirectional Long Short-Term Memory (BLSTM) network. We evaluate accuracy of our BLSTM network for the task of word sense disambiguation in the biomedical domain. First, we initialize the BLSTM network using pre-trained concept vectors (also known as concept embeddings). Then, we train the network on the biomedical textual data. As to the calculation of the pre-trained concept embeddings, we make use of Unified Medical Language System (UMLS) and MEDLINE abstracts and also employ Pointwise Mutual Information (PMI) and Latent Semantic Analysis/Indexing (LSA/LSI). Finally, we test the converged model on a holdout set. The experimental result on the MSH-WSD dataset (MeSH WSD dataset from National Library of Medicine, NLM) represents that the introduced deep learning model outperforms the state-of-the-art methods in terms of accuracy results.

##3 Project Outcome The outcome of this project is directly applicable to a wide range of NLP applications. These applications run the gamut from machine translation as well as automatic text summarization to information extraction and query answering in any given domain; they also cover specific tasks such as detection of adverse drug reactions from social media data and association discovery of diagnosis codes from electronic medical records (EMR).

Cite

Please cite our papers, code, and dataset if you use them in your work.

deepBioWSD paper, and aforementioned code, and dataset:

@article{pesaranghader2019deepbiowsd,
  title={deepBioWSD: effective deep neural word sense disambiguation of biomedical text data},
  author={Pesaranghader, Ahmad and Matwin, Stan and Sokolova, Marina and Pesaranghader, Ali},
  journal={Journal of the American Medical Informatics Association},
  volume={26},
  number={5},
  pages={438--446},
  year={2019},
  publisher={Oxford University Press}
}

Single Bidirectional LSTM for WSD:

@inproceedings{pesaranghader2018one,
  title={One single deep bidirectional LSTM network for word sense disambiguation of text data},
  author={Pesaranghader, Ahmad and Pesaranghader, Ali and Matwin, Stan and Sokolova, Marina},
  booktitle={Canadian Conference on Artificial Intelligence},
  pages={96--107},
  year={2018},
  organization={Springer}
}


About

deepBioWSD is a single Bidirectional Long Short-Term Memory (BLSTM) network for deep word sense disambiguation (WSD) of biomedical text data collectively.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published