Build word embeddings with a Keras implementation where the embedding vector is of length 50, 150 and 300. Use the Alice in Wonderland text book (alice.txt) for training . Use a window size of 2 to train the embeddings (window size in the jupyter notebook).
Build word embeddings of length 50, 150 and 300 using the Skipgram model Build word embeddings of length 50, 150 and 300 using CBOW model Analyze the different word embeddings:
Implement your own function to perform the analogy task as explained in Efficient estimation of word representations in vector space. Use the same distance metric as in the paper without using use existing libraries for this task such as Gensim. Your function should be able to answer whether an analogy as in example provided below is true.
A king is to a queen as a man is to a woman , where
e_x
denotes the embeddinge
of wordx
. We want to find the wordp
in the vocabulary, where the embedding ofp
(e_p)
is the closest to the predicted embedding (i.e. result of the formula). Then, we can check ifp
is the same word as the true word (man
in above example). Give at least 5 different examples of analogies. Compare the performance on the analogy tasks between the word embeddings and briefly discuss your results