An NLP and text analytics project on news articles for UVa Master's of Data Science class Exploratory Text Analytics
This project takes a csv file with scraped articles from Daily Kos, Politico, and Powerline, and performs a number of text analysis functions, including parsing and tokenization, tf-idf, word embeddings, PCA, topic models, and sentiment analysis. This is an unsupervised learning project to see what insights can be gleaned from applying these methods.
PCA done with Scikit-Learn library.
Sentiment Analysis done with Vader from NLTK
TF-IDF computed from scratch
Word Embeddings done with word2vec in Gensim.
Topic Modelling done with Mallet program.