Merge pull request ethen8181#9 from JiaxiangBU/patch-1

Fix the probability formula for some topic given some words
piushvaish · Jul 25, 2020 · 8a7749a · 8a7749a
2 parents 1f71423 + 2dc23e2
commit 8a7749a
Show file tree

Hide file tree

Showing 2 changed files with 1,729 additions and 227 deletions.
diff --git a/clustering_old/topic_model/LDA.Rmd b/clustering_old/topic_model/LDA.Rmd
@@ -147,7 +147,7 @@ Notice that this random assignment already gives you both the topic representati
 For each document d, go through each word w (a double for loop). Reassign a new topic to w, where we choose topic t with the probability of word w given topic t $\times$ probability of topic t given document d, denoted by the following mathematical notations: 
 
 $$ P( z_i = j \text{ }| \text{ } z_{-i}, w_i, d_i ) 
-    = \frac{ C^{WT}_{w_ij} + \eta }{ \sum^W_{ w = 1 }C^{WT}_{wj} + W\eta } \times
+    \propto \frac{ C^{WT}_{w_ij} + \eta }{ \sum^W_{ w = 1 }C^{WT}_{wj} + W\eta } \times
       \frac{ C^{DT}_{d_ij} + \alpha }{ \sum^T_{ t = 1 }C^{DT}_{d_it} + T\alpha }
 $$
 
@@ -195,6 +195,7 @@ left  <- ( wt[, wid] + eta ) / ( rowSums(wt) + length(vocab) * eta )
 right <- ( dt[1, ] + alpha ) / ( sum( dt[1, ] ) + K * alpha )
 
 # draw new topic for the first word in the first document 
+# The optional prob argument can be used to give a vector of weights for obtaining the elements of the vector being sampled. They need not sum to one, but they should be non-negative and not all zero.
 t1 <- sample(1:K, 1, prob = left * right)
 t1
 
@@ -219,7 +220,7 @@ alpha <- 1
 eta <- 0.001
 iterations <- 1000
 
-source("/Users/ethen/machine-learning/clustering_old/topic_model/LDA_functions.R")
+source("LDA_functions.R")
 set.seed(4321)
 lda1 <- LDA1( docs = docs, vocab = vocab, 
 			  K = K, alpha = alpha, eta = eta, iterations = iterations )
@@ -326,4 +327,4 @@ sessionInfo()
 - [Why tagging matters](http://cyber.law.harvard.edu/wg_home/uploads/507/07-WhyTaggingMatters.pdf)
 - [LDA mathematical notations](https://www.cl.cam.ac.uk/teaching/1213/L101/clark_lectures/lect7.pdf)
 - [Math free explanation of LDA](http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/)
-- [Explanation of LDA's hyperparameter](http://stats.stackexchange.com/questions/37405/natural-interpretation-for-lda-hyperparameters/37444#37444)
+- [Explanation of LDA's hyperparameter](http://stats.stackexchange.com/questions/37405/natural-interpretation-for-lda-hyperparameters/37444#37444)
diff --git a/clustering_old/topic_model/LDA.html b/clustering_old/topic_model/LDA.html