Skip to content

johnny161/Text-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Text-Clustering

基于Kmeans与Lda模型的多文档主题聚类,输入多篇文档,输出每个主题的关键词与相应文本,可用于主题发现与热点分析
由于聚类效果并不像精确度那样有一个具体的数值可做比较,本实验没有具体改进K-Means算法,而是从簇内损失值、轮廓系数、二维可视化效果三方面观察并确定一个大致合理的K值

Github Reference:

https://github.com/liuhuanyong/TopicCluster

Blog Reference:

Kmeans:

使用sklearn提取文本的tfidf特征: https://www.jianshu.com/p/c7e2771eccaa
TF-IDF计算与sklearn中CountVectorizer和TfidfTransformer使用: https://blog.csdn.net/wf592523813/article/details/81911155

LDA:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages