Changelog

Version 3.0

Expanded the grid search for some methods in the Oracle experiment
Changed from rank plots to critical-difference diagrams
Added additional analysis of annotator agreement
Various code changes and improvements

Version 2.0

Added the "zero" baseline method
Added a script to compute summary statistics
Added rank plots for multivariate datasets
Corrected an error in the computation of the F1 score and updated the results. This correction had no major effect on the conclusions of the paper.

Version 1.0

Initial release