Skip to content

a-valvaq-2086/text_mining_spanish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

text_mining_spanish

Text Mining and sentiment with R using tidytext package - coursework (in spanish)

Coursework for the Master's Degree in Big Data and Business Analytics U.N.E.D. Since the syllabus is in Spanish all the code and comments are left in Spanish.

The coursework consists in an RMarkdown script generating an HTML document. The document contains.

  1. Introduction, explaining the tidy philophy for the NLP package . Which is a tidy (as per H. Wickham tidyverse) alternative of the popular package.

  2. The script reads a Kaggle dataset containing the top 25 headlines of 1989 dates (from 2008 to 2017) in the Reddit r/worldnews

  3. After cleaning and wrangling the data, I carried out a simple "static" sentiment analysis -i.e. not analysing the overall sentiment of the headlines throught time. This could be a future line of work to create knowledge from a temporal variation of the headlines. (Seasonality of bad sentiment due to natural disasters, wars, general elections, financial crisis, etc).

  4. An additional line of future work is inspired by the Kaggle Competition by Two Sigma "Using News to predict stock movements" https://www.kaggle.com/c/two-sigma-financial-news.

About

Text Mining coursework (in spanish)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages