Skip to content

Scripts for reproducing the experiments of our paper at the IncrLearn workshop @ ICDM 2021

Notifications You must be signed in to change notification settings

smastelini/icdm2021_multiway_splits

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fast and lightweight binary and multi-branch Hoeffding Tree Regressors

Repository with the scripts (and the outputs) utilized in the manuscript. Due to their size, the datasets are not made available in this repository, but all of them are publicly available in the following repositories:

The Friedman and Mv datasets come from data generators and are available in the synth module of River.

Requirements

To install requirements:

pip install -r requirements.txt

Only River (the latest/development version is the preferred choice) is required to run the online learning models. The remaining packages are intended to manipulate log files, parse outputs, and generate charts.

Folder organization

  • output: Contains all the obtained (raw) logs, charts, and tables
    • airlines_case_study - The case study results
    • charts - biplot charts
    • final - aggregated logs (mean and std) and tree stats
    • nemenyi - input data to the nemenyi tests
    • tables - LaTeX tables generated via code
  • src: Contains the source code used in the experiments

How to reproduce the experiments

The utils.py file controls all the experimental variables, such as output and input folder, number of repetitions, which algorithms are going to be performed, and so on. You modify the experiments' parameters there.

To run the tree models:

python run.py

To run the baselines:

python run_baselines.py

The airlines case study can be reproduced by using (note that a subset of the trees was used in this case):

python run_airlines.py

The results tables can be generated by using:

python table_generator.py table-suffix

where table-suffix is a suffix that is going to be appended to the obtained tables. Before that, however, the raw logs must be aggregated using:

python parse_output.py

The table with all the obtained tree stats can be generated with:

python tree_stats_table.py

Moreover, some additional scripts:

  • data_info_table.py: used to generate a table with the datasets' characteristics
  • biplot_generator.py: generates the biplot used in the paper
  • generate_nemenyi_data.py: assembles the inputs for the Friedman and Nemenyi tests
  • case_study_plot.ipynb: generates the charts concerning the airlines case study

About

Scripts for reproducing the experiments of our paper at the IncrLearn workshop @ ICDM 2021

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published