Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

most_similar_among functionality added with tests #1255

Closed
wants to merge 2 commits into from

Conversation

shubhvachher
Copy link
Contributor

Fix #148. After @gojomo's comprehensive review from #1229

I have split PR's for all drive-by bug fixes as suggested. Some follow ups from previous comments :

Since people might not find index2word, I can see the benefit of an accessor method. However, given that this is now a generic KeyedVectors class, the name shouldn't be 'words'-centric. I'd suggest ordered_keys.

Done. I have also added a method to class Word2Vec in order to solve the original "people might not find index2word" problem.

I think suppress_warnings adds extra complication for little benefit.

Actually, calculating difference in sets (which is needed to generate this warning) can be an unnecessary overhead for people using this code in production environment. I have added a logging.info call to inform users of the same.
Sometimes it is not possible to remove all non-vocabulary words from the words-list, for example, if words-list is generated during runtime. Here using suppress_warnings would be ideal.

Seems more like something to log than use warnings.

Done. Yes, you're right. The module uses logging throughout, unnecessary import and use of warnings was not a good choice.

Because most_similar_among() functionality is merely on a set of vectors, it's not dependent on the mode-of-training – so there's no need for multiple models built different ways.

Ok sure, removing extra tests.

In fact, it'd ultimately make even more sense to be inside a set of KeyedVectors tests, rather than Word2Vec tests, but since its sibling methods are tested here it's not necessary to make that big change yet.

Quoted here to keep track.

@tmylk
Copy link
Contributor

tmylk commented Apr 3, 2017

Please continue in the original PR

@tmylk tmylk closed this Apr 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

word2vec errors with GoogleNews-vectors-negative300.bin model
2 participants