We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 103e903 commit 169c6bcCopy full SHA for 169c6bc
Part 7 - Natural Language Processing/NLP.R
@@ -59,3 +59,13 @@ corpus = tm_map(corpus, stemDocument)
59
# 6. extra spaces -> remove (extra spaces left from removing numbers for example)
60
corpus = tm_map(corpus, stripWhitespace)
61
62
+
63
+# Creating the sparse matrix (very few non-zero values)
64
65
+dtm = DocumentTermMatrix(corpus) #dtm = sparse matrix
66
67
+# filter - the most frequent words from dtm
68
+dtm = removeSparseTerms(dtm, 0.999) # we want to keep 99.9% of the most frequent words, the smaller the number of reviews -> the bigger the proportion
69
70
71
0 commit comments