Changes include:
- Users can now input ratings to weigh recommendations
- Fixes for how multiple inputs recommendations were being calculated
- Switching over to an src structure
- Code quality is now checked with Codacy
- Extensive code formatting to improve quality and style
- Bug fixes and a more explicit use of exceptions
- More extensive contributing guidelines
Changes include:
- Multiple Infobox topics can be subsetted for at the same time
- Users have greater control of the cleaning process
- The cleaning process is verbose and uses multiprocessing
- The workflow for all models has been improved and explained
- Methods have been developed to combine modeling techniques for better results
First stable release of wikirec
- Functions to subset Wikipedia in any language by infobox topics have been provided
- A multilingual cleaning process that can clean texts of any language to varying degrees of efficacy is included
- Similarity matrices can be generated from embeddings using the following models:
- BERT
- Doc2vec
- LDA
- TFIDF
- Similarity matrices can be created using either cosine or euclidean relations
- Usage examples have been provided for multiple input types
- Optimal LDA topic numbers can be inferred graphically
- The package is fully documented
- Virtual environment files are provided
- Extensive testing of all modules with GH Actions and Codecov has been performed
- A code of conduct and contribution guidelines are included