Text Mining and sentiment with R using tidytext package - coursework (in spanish)
Coursework for the Master's Degree in Big Data and Business Analytics U.N.E.D. Since the syllabus is in Spanish all the code and comments are left in Spanish.
The coursework consists in an RMarkdown script generating an HTML document. The document contains.
-
Introduction, explaining the tidy philophy for the NLP package . Which is a tidy (as per H. Wickham tidyverse) alternative of the popular package.
-
The script reads a Kaggle dataset containing the top 25 headlines of 1989 dates (from 2008 to 2017) in the Reddit r/worldnews
-
After cleaning and wrangling the data, I carried out a simple "static" sentiment analysis -i.e. not analysing the overall sentiment of the headlines throught time. This could be a future line of work to create knowledge from a temporal variation of the headlines. (Seasonality of bad sentiment due to natural disasters, wars, general elections, financial crisis, etc).
-
An additional line of future work is inspired by the Kaggle Competition by Two Sigma "Using News to predict stock movements" https://www.kaggle.com/c/two-sigma-financial-news.