Releases · andrewtavis/wikirec · GitHub

28 Dec 16:21

andrewtavis

wikirec 1.0.0 Latest

Latest

Release switches wikirec over to semantic versioning and indicates that it is stable

Assets 2

20 May 09:59

andrewtavis

wikirec 0.2.2

Changes include:

The WikilinkNN model has been added allowing users to derive recommendations based which articles are linked to the same other Wikipedia articles
Examples have been updated to reflect this new model
books_embedding_model.h5 is provided for quick experimentation
enwiki_books.ndjson has been updated with a more recent dump
Function docstring grammar fixes
Baseline testing for the new model has been added to the CI

Assets 2

29 Apr 11:51

andrewtavis

wikirec 0.2.1

Changes include:

Support has been added for gensim 3.8.x and 4.x
Wikipedia links are now an output of data_utils.parse_to_ndjson
Dependencies in requirement and environment files are now condensed

Assets 2

16 Apr 15:21

andrewtavis

wikirec 0.2.0

Changes include:

Users can now input ratings to weigh recommendations
Fixes for how multiple inputs recommendations were being calculated
Switching over to an src structure
Code quality is now checked with Codacy
Extensive code formatting to improve quality and style
Bug fixes and a more explicit use of exceptions
More extensive contributing guidelines

Assets 2

14 Mar 20:37

andrewtavis

wikirec 0.1.1.7

Changes include:

Multiple Infobox topics can be subsetted for at the same time
Users have greater control of the cleaning process
The cleaning process is verbose and uses multiprocessing
The workflow for all models has been improved and explained
Methods have been developed to combine modeling techniques for better results

Assets 2

08 Mar 19:13

andrewtavis

wikirec 0.1.0

wikirec 0.1.0 (March 8, 2021)

First stable release of wikirec

Functions to subset Wikipedia in any language by infobox topics have been provided
A multilingual cleaning process that can clean texts of any language to varying degrees of efficacy is included
Similarity matrices can be generated from embeddings using the following models:
- BERT
- Doc2vec
- LDA
- TFIDF
Similarity matrices can be created using either cosine or euclidean relations
Usage examples have been provided for multiple input types
Optimal LDA topic numbers can be inferred graphically
The package is fully documented
Virtual environment files are provided
Extensive testing of all modules with GH Actions and Codecov has been performed
A code of conduct and contribution guidelines are included

Assets 2