Preprocessing scripts to read definitions and other information from dictionaries. This repository is for AAAI2017 paper: "Definition Modeling: Learning to define word embeddings in natural language".
- Wordnik provides an API to get word definitions and other information from multiple dictionaries. You will need an API Key to access (see Developer site).
- GCIDE, GNU Collaborative International Dictionary of English, contains entries mostly from Webster. This project use a pre-processed version of the original release which can be found here.
- WordNet contains about 150,000 words and phrases. This project uses NLTK to read data from WordNet.
- HillF_TACL2016 provides more than 800k definitions from WordNik API along with word embeddings. This data accompany this paper.
For detail of the data, see Data