TermExtractRPCA

A Robust-PCA Approach for Term Extracting

Limited by the algorithm itself, we do not optimize Chinese, and the program can only process at the word level without considering the semantic level.

Robust-pca is used from https://github.com/14MBD4/pytorch-RPCA

The corpus is translated from an article from website, and we have been authorized by the original author

usage:

transforming raw corpus to remove the punctuation symbols, pure numbers and the empty lines

python main.py corpus.txt corpus_transform

give out terms with values after rpca

python main.py corpus_transformed.txt

give out words with counts in the article(with terms and corresponding values)

python main.py corpus_transformed.txt vocab_count

give out word-cloud figure(with terms and corresponding values)

python main.py corpus_transformed.txt word_cloud

give out word-cloud figure based on the user's own choices or combination of results

python main.py self_word_cloud

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
functional_tools.py		functional_tools.py
main.py		main.py
requirements.txt		requirements.txt
robust_pca.py		robust_pca.py
term_extract.py		term_extract.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TermExtractRPCA

usage:

About

Releases

Packages

Contributors 2

Languages

License

felisevan/TermExtractRPCA

Folders and files

Latest commit

History

Repository files navigation

TermExtractRPCA

usage:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages