1. Overview [paper]
This repository provides a basic implementation of the Decision Stream regression and classification algorithm. Unlike the classical decision tree approach, this method builds a directed acyclic graph with high degree of connectivity by merging statistically indistinguishable nodes at each iteration.
- Clojure
- Apache Commons Math
- JBLAS (requires ATLAS or BLAS/LAPACK)
- OpenCSV
The dependencies are configured in the pom.xml file.
- Extract the archive
data.gz
with training data by runningtar -xvzf data.gz
- Optional: rebuild
decision-stream.jar
with Leiningen (lein uberjar
) or Maven (mvn package
).
java -jar decision-stream.jar base-directory train-data train-answers test-data test-answers learning_mode significance-threshold
The program takes 7 input parameters:
base-directory
- path to the dataset
train-data
- file with training data
train-answers
- file with training answers
test-data
- file with test data
test-answers
- file with test answers
learning_mode:
classification
orregression
- classification or regression problem
significance-threshold
- threshold for merging/splitting operations
Example:
java -jar decision-stream.jar data/ailerons/ train_data.csv train_answ.csv test_data.csv test_answ.csv regression 0.02
The datasets prepared for training in the data
folder: