Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
minasmz authored Nov 22, 2021
1 parent 893d354 commit 23b1a67
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# Persian-Summarization

For more information please refer to [our article](http://conf.kntu.ac.ir/cnf_papers/csicc2021/articleFiles2/r_411_201229073907.pdf) and cite it if it was helpful in your work.
# Statistical and semantical text summarizer in Persian language

It’s a project for text summarization in Persian language. It uses text summarization of [Gensim python library](https://github.com/RaRe-Technologies/gensim) for implementing [TextRank algorithm](https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf). This algorithm assumes each sentence a node in a graph and returns nodes with highest relation with other nodes (sentences). In other words it returns most important nodes with some statistical calculation and does not include any semantics of the sentences. For instance if you use different words for the same meaning it won’t recognize and assumes they are different which in reality they are not. For solving this problem and including semantic in the result I trained a doc2vec model by doc2vec.py in Genism with [Hamshahri corpus](http://dbrg.ut.ac.ir/hamshahri/) as training set. The doc2vec model is included in the repository (my_model_sents_from_res2.doc2vec). I used this model for calculating similarity of two sentences for weighting the graph edges. (instead of weighting based on some tf-idf algorithm which is used in Gensim) and return the result by TextRank algorithm.
Expand Down

0 comments on commit 23b1a67

Please sign in to comment.