TextAnalysis

This is an R package documents the workflow of text mining, topic modeling, and sentiment analysis. Specifically, it was used for the project to analyze Twitter and news articles related to the Orlando Shooting incidence. At the time we work on this project, another shooting incidence occurred in Las Vegas. We also compared the response to Orlando Shooting v.s. Las Vegas Shooting.

This is version_1.

There are bugs. Will fix them.

Loading data

We store the LexisNexis news data into the SQLite database.

LexisNexis_Orlando <- read_LexisNexis("LexisNexis_v1.db", metadata=TRUE, format=TRUE)
document <- LexisNexis_Orlando$document
meta <- LexisNexis_Orlando$meta

Document-term matrix

The create_DTM function can create the document term matrix conveniently.

text_stemmed <- create_DTM(document, ID=ID, text=FULL_TEXT, n_gram=1, stemming=TRUE)
text_nonstemmed <- create_DTM(document, ID=ID, text=FULL_TEXT, n_gram=1, stemming=FALSE)

Visualize the top 10% frequent words

The nonstemmed version of text

freq_plot <- plot_word_freq(text_nonstemmed, q=0.1, display = TRUE)

Sentiment analysis

We can use extract_sentiment function to calculate the sentiment scores with different lexicons, for example, afinn and nrc.

stm_affin <- extract_sentiment(text, "afinn")
stm_nrc <- extract_sentiment(text, "nrc")

LDA topics

run_LDA wraps the results from Latent Dirichlet Allocation (LDA) model.

lda_result <- run_LDA(text_stem, topic_num = 6, topic_term_n = 20, q = 0.1)
Beta <- lda_result$Beta
Gamma <- lda_result$Gamma %>% rename(ID = document)
print(lda_result$topic_term_plot)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
R		R
Result_11142017		Result_11142017
man		man
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LexisNexis_v1.db		LexisNexis_v1.db
LexisNexis_v2.db		LexisNexis_v2.db
NAMESPACE		NAMESPACE
README.md		README.md
TextAnalysis.Rproj		TextAnalysis.Rproj
result_1114.R		result_1114.R
script.r		script.r

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextAnalysis

Loading data

Document-term matrix

Visualize the top 10% frequent words

Sentiment analysis

LDA topics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TextAnalysis

Loading data

Document-term matrix

Visualize the top 10% frequent words

Sentiment analysis

LDA topics

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages