This is an R package documents the workflow of text mining, topic modeling, and sentiment analysis. Specifically, it was used for the project to analyze Twitter and news articles related to the Orlando Shooting incidence. At the time we work on this project, another shooting incidence occurred in Las Vegas. We also compared the response to Orlando Shooting v.s. Las Vegas Shooting.
This is version_1.
There are bugs. Will fix them.
We store the LexisNexis news data into the SQLite database.
LexisNexis_Orlando <- read_LexisNexis("LexisNexis_v1.db", metadata=TRUE, format=TRUE)
document <- LexisNexis_Orlando$document
meta <- LexisNexis_Orlando$metaThe create_DTM function can create the document term matrix conveniently.
text_stemmed <- create_DTM(document, ID=ID, text=FULL_TEXT, n_gram=1, stemming=TRUE)
text_nonstemmed <- create_DTM(document, ID=ID, text=FULL_TEXT, n_gram=1, stemming=FALSE)The nonstemmed version of text
freq_plot <- plot_word_freq(text_nonstemmed, q=0.1, display = TRUE)We can use extract_sentiment function to calculate the sentiment scores with different lexicons, for example, afinn and nrc.
stm_affin <- extract_sentiment(text, "afinn")
stm_nrc <- extract_sentiment(text, "nrc")run_LDA wraps the results from Latent Dirichlet Allocation (LDA) model.
lda_result <- run_LDA(text_stem, topic_num = 6, topic_term_n = 20, q = 0.1)
Beta <- lda_result$Beta
Gamma <- lda_result$Gamma %>% rename(ID = document)
print(lda_result$topic_term_plot)