Repository with the scripts (and the outputs) utilized in the manuscript. Due to their size, the datasets are not made available in this repository, but all of them are publicly available in the following repositories:
The Friedman and Mv datasets come from data generators and are available in the synth
module of River.
To install requirements:
pip install -r requirements.txt
Only River (the latest/development version is the preferred choice) is required to run the online learning models. The remaining packages are intended to manipulate log files, parse outputs, and generate charts.
output
: Contains all the obtained (raw) logs, charts, and tablesairlines_case_study
- The case study resultscharts
- biplot chartsfinal
- aggregated logs (mean and std) and tree statsnemenyi
- input data to the nemenyi teststables
- LaTeX tables generated via code
src
: Contains the source code used in the experiments
The utils.py
file controls all the experimental variables, such as output and input folder, number of repetitions, which algorithms are going to be performed, and so on.
You modify the experiments' parameters there.
To run the tree models:
python run.py
To run the baselines:
python run_baselines.py
The airlines case study can be reproduced by using (note that a subset of the trees was used in this case):
python run_airlines.py
The results tables can be generated by using:
python table_generator.py table-suffix
where table-suffix
is a suffix that is going to be appended to the obtained tables. Before that, however, the raw logs must be aggregated using:
python parse_output.py
The table with all the obtained tree stats can be generated with:
python tree_stats_table.py
Moreover, some additional scripts:
data_info_table.py
: used to generate a table with the datasets' characteristicsbiplot_generator.py
: generates the biplot used in the papergenerate_nemenyi_data.py
: assembles the inputs for the Friedman and Nemenyi testscase_study_plot.ipynb
: generates the charts concerning the airlines case study