- Expanded the grid search for some methods in the Oracle experiment
- Changed from rank plots to critical-difference diagrams
- Added additional analysis of annotator agreement
- Various code changes and improvements
- Added the "zero" baseline method
- Added a script to compute summary statistics
- Added rank plots for multivariate datasets
- Corrected an error in the computation of the F1 score and updated the results. This correction had no major effect on the conclusions of the paper.
- Initial release