Skip to content

Commit 169c6bc

Browse files
Creating the sparse matrix and filtering the most frequent words
1 parent 103e903 commit 169c6bc

File tree

1 file changed

+10
-0
lines changed
  • Part 7 - Natural Language Processing

1 file changed

+10
-0
lines changed

Part 7 - Natural Language Processing/NLP.R

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,3 +59,13 @@ corpus = tm_map(corpus, stemDocument)
5959
# 6. extra spaces -> remove (extra spaces left from removing numbers for example)
6060
corpus = tm_map(corpus, stripWhitespace)
6161

62+
63+
# Creating the sparse matrix (very few non-zero values)
64+
65+
dtm = DocumentTermMatrix(corpus) #dtm = sparse matrix
66+
67+
# filter - the most frequent words from dtm
68+
dtm = removeSparseTerms(dtm, 0.999) # we want to keep 99.9% of the most frequent words, the smaller the number of reviews -> the bigger the proportion
69+
70+
71+

0 commit comments

Comments
 (0)