Skip to content

Further focus/slim keyedvectors.py module #2873

Open
@gojomo

Description

Pre-#2698, keyedvectors.py was 2500+ lines, including functionality over-specific to other models, & redundant classes. Post-#2698, with some added generic functionality, it's still over 1800 lines.

It should shed some other grab-bag utility functions that have accumulated, & don't logically fit inside the KeyedVectors class.

In particular, the evaluation (analogies, word_ranks) helpers could move to their own module that takes a KV instance as an argument. (If other more-sophisticated evaluations can be contributed, as would be welcome, they should also live alongside those, rather than bloating KeyedVectors.)

The get_keras_embedding method, as its utilit is narrow to very specific uses, and is conditional on a not-necessarily install package, could go elsewhere too – either a kera-focused utilities module, or even just documentation/example code about how to convert to/from keras from `KeyedVectors.

Some of the more advanced word-vector-using calculations, like 'Word Mover's Distance' or 'Soft Cosine SImilarity', could move to method-specific modules that are then better documented/self-contained/optimized, without bloating the generic 'set of vectors' module. (They might be more discoverable, there, as well.)

And finally, some of the existing calculations could be unified/streamlined (especially the two variants of most_similar(), and some of the steps shared by multiple operations). My hope would be the module is eventually <1000 lines.

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions