Skip to content

Latest commit

 

History

History
64 lines (40 loc) · 2.46 KB

README.md

File metadata and controls

64 lines (40 loc) · 2.46 KB

Fast and lightweight binary and multi-branch Hoeffding Tree Regressors

Repository with the scripts (and the outputs) utilized in the manuscript. Due to their size, the datasets are not made available in this repository, but all of them are publicly available in the following repositories:

The Friedman and Mv datasets come from data generators and are available in the synth module of River.

Requirements

To install requirements:

pip install -r requirements.txt

Only River (the latest/development version is the preferred choice) is required to run the online learning models. The remaining packages are intended to manipulate log files, parse outputs, and generate charts.

Folder organization

  • output: Contains all the obtained (raw) logs, charts, and tables
    • airlines_case_study - The case study results
    • charts - biplot charts
    • final - aggregated logs (mean and std) and tree stats
    • nemenyi - input data to the nemenyi tests
    • tables - LaTeX tables generated via code
  • src: Contains the source code used in the experiments

How to reproduce the experiments

The utils.py file controls all the experimental variables, such as output and input folder, number of repetitions, which algorithms are going to be performed, and so on. You modify the experiments' parameters there.

To run the tree models:

python run.py

To run the baselines:

python run_baselines.py

The airlines case study can be reproduced by using (note that a subset of the trees was used in this case):

python run_airlines.py

The results tables can be generated by using:

python table_generator.py table-suffix

where table-suffix is a suffix that is going to be appended to the obtained tables. Before that, however, the raw logs must be aggregated using:

python parse_output.py

The table with all the obtained tree stats can be generated with:

python tree_stats_table.py

Moreover, some additional scripts:

  • data_info_table.py: used to generate a table with the datasets' characteristics
  • biplot_generator.py: generates the biplot used in the paper
  • generate_nemenyi_data.py: assembles the inputs for the Friedman and Nemenyi tests
  • case_study_plot.ipynb: generates the charts concerning the airlines case study