Skip to content

Commit

Permalink
Merge pull request ethen8181#9 from JiaxiangBU/patch-1
Browse files Browse the repository at this point in the history
Fix the probability formula for some topic given some words
  • Loading branch information
ethen8181 authored Jul 25, 2020
2 parents 1f71423 + 2dc23e2 commit 8a7749a
Show file tree
Hide file tree
Showing 2 changed files with 1,729 additions and 227 deletions.
7 changes: 4 additions & 3 deletions clustering_old/topic_model/LDA.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ Notice that this random assignment already gives you both the topic representati
For each document d, go through each word w (a double for loop). Reassign a new topic to w, where we choose topic t with the probability of word w given topic t $\times$ probability of topic t given document d, denoted by the following mathematical notations:

$$ P( z_i = j \text{ }| \text{ } z_{-i}, w_i, d_i )
= \frac{ C^{WT}_{w_ij} + \eta }{ \sum^W_{ w = 1 }C^{WT}_{wj} + W\eta } \times
\propto \frac{ C^{WT}_{w_ij} + \eta }{ \sum^W_{ w = 1 }C^{WT}_{wj} + W\eta } \times
\frac{ C^{DT}_{d_ij} + \alpha }{ \sum^T_{ t = 1 }C^{DT}_{d_it} + T\alpha }
$$

Expand Down Expand Up @@ -195,6 +195,7 @@ left <- ( wt[, wid] + eta ) / ( rowSums(wt) + length(vocab) * eta )
right <- ( dt[1, ] + alpha ) / ( sum( dt[1, ] ) + K * alpha )
# draw new topic for the first word in the first document
# The optional prob argument can be used to give a vector of weights for obtaining the elements of the vector being sampled. They need not sum to one, but they should be non-negative and not all zero.
t1 <- sample(1:K, 1, prob = left * right)
t1
Expand All @@ -219,7 +220,7 @@ alpha <- 1
eta <- 0.001
iterations <- 1000
source("/Users/ethen/machine-learning/clustering_old/topic_model/LDA_functions.R")
source("LDA_functions.R")
set.seed(4321)
lda1 <- LDA1( docs = docs, vocab = vocab,
K = K, alpha = alpha, eta = eta, iterations = iterations )
Expand Down Expand Up @@ -326,4 +327,4 @@ sessionInfo()
- [Why tagging matters](http://cyber.law.harvard.edu/wg_home/uploads/507/07-WhyTaggingMatters.pdf)
- [LDA mathematical notations](https://www.cl.cam.ac.uk/teaching/1213/L101/clark_lectures/lect7.pdf)
- [Math free explanation of LDA](http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/)
- [Explanation of LDA's hyperparameter](http://stats.stackexchange.com/questions/37405/natural-interpretation-for-lda-hyperparameters/37444#37444)
- [Explanation of LDA's hyperparameter](http://stats.stackexchange.com/questions/37405/natural-interpretation-for-lda-hyperparameters/37444#37444)
1,949 changes: 1,725 additions & 224 deletions clustering_old/topic_model/LDA.html

Large diffs are not rendered by default.

0 comments on commit 8a7749a

Please sign in to comment.