Fast and lightweight binary and multi-branch Hoeffding Tree Regressors

Repository with the scripts (and the outputs) utilized in the manuscript. Due to their size, the datasets are not made available in this repository, but all of them are publicly available in the following repositories:

Luís Torgo repository
UCI repository
Open ML

The Friedman and Mv datasets come from data generators and are available in the synth module of River.

Requirements

To install requirements:

pip install -r requirements.txt

Only River (the latest/development version is the preferred choice) is required to run the online learning models. The remaining packages are intended to manipulate log files, parse outputs, and generate charts.

Folder organization

output: Contains all the obtained (raw) logs, charts, and tables
- airlines_case_study - The case study results
- charts - biplot charts
- final - aggregated logs (mean and std) and tree stats
- nemenyi - input data to the nemenyi tests
- tables - LaTeX tables generated via code
src: Contains the source code used in the experiments

How to reproduce the experiments

The utils.py file controls all the experimental variables, such as output and input folder, number of repetitions, which algorithms are going to be performed, and so on. You modify the experiments' parameters there.

To run the tree models:

python run.py

To run the baselines:

python run_baselines.py

The airlines case study can be reproduced by using (note that a subset of the trees was used in this case):

python run_airlines.py

The results tables can be generated by using:

python table_generator.py table-suffix

where table-suffix is a suffix that is going to be appended to the obtained tables. Before that, however, the raw logs must be aggregated using:

python parse_output.py

The table with all the obtained tree stats can be generated with:

python tree_stats_table.py

Moreover, some additional scripts:

data_info_table.py: used to generate a table with the datasets' characteristics
biplot_generator.py: generates the biplot used in the paper
generate_nemenyi_data.py: assembles the inputs for the Friedman and Nemenyi tests
case_study_plot.ipynb: generates the charts concerning the airlines case study

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Fast and lightweight binary and multi-branch Hoeffding Tree Regressors

Requirements

Folder organization

How to reproduce the experiments

Files

README.md

Latest commit

History

README.md

File metadata and controls

Fast and lightweight binary and multi-branch Hoeffding Tree Regressors

Requirements

Folder organization

How to reproduce the experiments