Unofficial implementation of the Efficient Estimation of Word Representations in Vector Space paper written in PyTorch with code for training and demonstration of the properties of the trained model. Emphasis was placed on the Skip-gram Model only.
Files to be familiarized with:
- word2vec.pthis a pre-trained model on the Amazon Fashion dataset with a 4000-word vocabulary,
- inference.ipynbcontains the playground and demonstrates some properties of the model,
- train.ipynbtrains word2vec from scratch. Use it if you want to customize the training process for yourself,
- extra/cloud.svgshows t-SNE visualization of the most distinct word clusters.
git clone https://github.com/tejpaper/word2vec.git
cd word2vec
pip install -r requirements.txt| Emotions and feelings | 
| Family | 
| Seasons | 
| Numbers | 
| Colors | 
| Body parts | 
| Clothes | 
| Sizes | 
MIT