Skip to content

Generating Word2Vec (Text embeddings) - for large scale heterogenous networks

License

Notifications You must be signed in to change notification settings

riddhi2000/Predictive_Text_Embedding

Repository files navigation

Text Embedding is the task of generating embeddings (low dimensional vector representations or Word2Vec) for the text. Several techniques are there to generate so, but they dont scale every well. Technique presented here is very scalable and works very well on hetrogenous graphs.

This is an implementation of the following paper: PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks by Jian Tang, Meng Qu, Qiaozhu Mei. KDD’15, August 10-13, 2015, Sydney, NSW, Australia.

Using the text embeddings generated by the algorithm, we have done the sentiment analysis for movie reviews data and results are outstanding (matches with what described in the paper). We got ~89% accuracy.

Dependency:

1> Numpy 2> Scipy 3> Theano

Steps to run code:

python train.py python test.py

Link For the Dataset used: http://ai.stanford.edu/~amaas/data/sentiment/

About

Generating Word2Vec (Text embeddings) - for large scale heterogenous networks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages