Skip to content

model.wv.accuracy messes up the model object #2014

Open
@ahmedahmedov

Description

I was playing around with model.wv.accuracy to test the accuracy of my model. It worked fine when the only parameter I set was the filename:

model.wv.accuracy("questions-words.txt")
Things get weird though when I set the 'most_similar' parameter of the function:

model.wv.accuracy("questions-words.txt", most_similar=gensim.models.KeyedVectors.most_similar_cosmul)

First, it gives an error saying that gensim.models.KeyedVectors.most_similar_cosmul doesn't have a parameter named 'retrict_vocabulary'.

Most importantly, right after that, the model vocabulary is messed up for some reason. That is, when calling model.wv.accuracy("question-words.txt") again, I get the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-36-41dd7cacf846> in <module>()
----> 1 model.wv.accuracy("questions-words.txt", most_similar=gensim.models.KeyedVectors.most_similar_cosmul)
      2 model.wv.evaluate_word_pairs("wordsim353.tsv")

/home/user/anaconda3/lib/python3.5/site-packages/gensim/models/keyedvectors.py in accuracy(self, questions, restrict_vocab, most_similar, case_insensitive)
    772 
    773         """
--> 774         ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
    775         ok_vocab = {w.upper(): v for w, v in reversed(ok_vocab)} if case_insensitive else dict(ok_vocab)
    776 

/home/user/anaconda3/lib/python3.5/site-packages/gensim/models/keyedvectors.py in <listcomp>(.0)
    772 
    773         """
--> 774         ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
    775         ok_vocab = {w.upper(): v for w, v in reversed(ok_vocab)} if case_insensitive else dict(ok_vocab)
    776 

KeyError: 'the'

calling model.wv.most_similar also gives the same error.

Metadata

Assignees

Labels

bugIssue described a bugdifficulty easyEasy issue: required small fiximpact MEDIUMBig annoyance for affected users

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions