Skip to content
This repository was archived by the owner on Jan 26, 2021. It is now read-only.

Support asymmetric Dirichlet prior #22

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

hiyijian
Copy link
Contributor

According to Wallach’s paper, asymmetric, hierarchical Dirichlet prior over the document–topic distributions and a symmetric Dirichlet prior over the topic–word distributions results in significantly better model
This PR supports asymmetric alpha in following steps:

  1. Add two extra tables to Multiverso. One is topic frequency table, a matrix to count each topics’ frequency. The other one is doc length table, a row to count how many document is with length k.
  2. Initialize the two extra tables with random initialized documents
  3. Learn alpha distribution with the two extra table every 5 iterations
  4. Build alias table for leanred alpha distribution
  5. Sample topics with learned alpha distribution and alias table. Meanwhile, update countings of topic frequency table if necessary

To use this new feature, please just run with an extra option "-num_alpha_iterations".

Please notice that there are two TODOs. One is Evaluation in asymmetric prior mode, the other is Inference with asymmetric prior.


if(Config::asymmetric_prior)
{
// Request topic-frequency-table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only need to request the table if iter % num_alpha_iteration == 0?

@feiga
Copy link
Contributor

feiga commented Jan 25, 2016

Thanks for the great work!@hiyijian I'm really sorry for my late response. The implementation should be OK. I review the code and add some notes. I think it's OK to merge to the master.

@lisendong
Copy link

@hiyijian I'm trying to use asymmetric lda. count you tell me your QQ or Wechat? I have some questions.

@hiyijian
Copy link
Contributor Author

@lisendong My implemention is on the top of source codes provided by @feiga , which I think is used in Microsoft only. However, my implemention is incorrect, I think. Fell free to concact me via @hiyijian@qq.com

@tangzhenyu
Copy link

@hiyijian I'm tring to use your asymmetric prior version of lightlda that based on microsoft's code,I am seeing that you said "my implemention is incorrect",is that meaning your code still had some bug?

@koustuvsinha
Copy link

hi @hiyijian @feiga, why is this not merged yet? Are there are issues with the implementation?

@hiyijian
Copy link
Contributor Author

hi @tangzhenyu and @koustuvsinha. I said "this implemention is incorrect" since I have not seen any improvement compared by feiga's original symmetric LDA, Sometimes seems even worse :( .

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants