-
Notifications
You must be signed in to change notification settings - Fork 3
GoogleTechTalks: Modeling Science: Dynamic Topic Models of Scholarly
Topic Models:
- automatically discover topic from collection documents
- automatically label images that are unlabeled
- models connections between topics (group of words)
LDA (Latent Dirichlet allocation):
-
Treat data s observation that arise from a generative probabilistic process that includes hidden variables ** For documents, the hidden variables reflect the thematic structure of the collection
-
General Idea:
⋅⋅⋅ Cast the intuition about the data into generative probabilistic process:
⋅⋅⋅ Each document is a random mixture of carpus-wide topics
⋅⋅⋅ Each word is drawn from one of these topics -
Algorithms variables:
⋅⋅⋅ K: number of topics
⋅⋅⋅ Beta K: Distributions of all the words in our vocabulary
⋅⋅⋅ D: Document
⋅⋅⋅ Theta: Each topic and their distributions
⋅⋅⋅ N: Words in the document
⋅⋅⋅ Z: Each word choose a topic that it belongs too
⋅⋅⋅ W: for each topic look up the word and draw the word from Beta distribution -
Algorithm (informal) steps:
⋅⋅⋅ 1. Draw each topic beta ~ Dir(N), for i elemnt {i, ......, K}
⋅⋅⋅ 2. For each document:
⋅⋅⋅⋅⋅⋅ 1. Draw word a random from topic
⋅⋅⋅⋅⋅⋅ 2. Draw the word distribution from the topic beta distribution -
Images:
Vocabulary:
- Generative model: is a model for randomly generating observable data, typically given some hidden parameters. It specifies a joint probability distribution over observation and label sequences
