You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I trained a NMPI phraser on the latest wikipedia dump. It is my understanding that scores should be <= 1.0, but I get a higher score.
Steps/code/corpus to reproduce
fromgensim.corporaimportWikiCorpusfromgensim.modelsimportPhrasesfromgensim.models.phrasesimportPhraserwiki_corpus=WikiCorpus("enwiki-latest-pages-articles-multistream.xml.bz2", dictionary={})
ENGLISH_CONNECTOR_WORDS=frozenset(
" a an the "# articles; we never care about these in MWEs" for of with without at from to in on by "# prepositions; incomplete on purpose, to minimize FNs" and or "# conjunctions; incomplete on purpose, to minimize FNs
.split()
)
phrases=Phrases(wiki_corpus.get_texts(), scoring='npmi', threshold=0.75, min_count=5, common_terms=ENGLISH_CONNECTOR_WORDS, max_vocab_size=80000000)
phraser=Phraser(phrases)
Problem description
I trained a NMPI phraser on the latest wikipedia dump. It is my understanding that scores should be <= 1.0, but I get a higher score.
Steps/code/corpus to reproduce
Then:
Versions
The text was updated successfully, but these errors were encountered: