Latent Dirichlet Allocation

Code for HKUST MATH 5472 Final Project

Python implementation of LDA based on Latent Dirichlet Allocation

Usage

from preprocess import *
from main import *

corpus = preprocessing(M=200)

preprocess.py provides a text-preprocessing for American Press corpus, returns a list where each element represents a document coded by {0,1}. Preprocess the first M documents in AP corpus.

alpha, beta = LDA.parameter_estimation(corpus, k=10, tol=1e-6, max_iter=100)

LDA.parameter_estimation performs variantial inference EM to estimate Dirichlet parameter alpha, and word probability beta. Number of topics k should be given.

Examples

Check the notebooks sim_data.ipynb and ap_modeling.ipynb to play the examples in report lda.pdf.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
ap		ap
.gitignore		.gitignore
README.md		README.md
ap_modeling.ipynb		ap_modeling.ipynb
lda.pdf		lda.pdf
main.py		main.py
preprocess.py		preprocess.py
sim_data.ipynb		sim_data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Latent Dirichlet Allocation

Usage

Examples

About

Uh oh!

Releases

Packages

Languages

zihaophys/latent_dirichlet_allocation

Folders and files

Latest commit

History

Repository files navigation

Latent Dirichlet Allocation

Usage

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages