1. Overview [paper]
This repository provides a basic implementation of the Decision Stream regression and classification algorithm. Unlike the classical decision tree approach, this method builds a directed acyclic graph with high degree of connectivity by merging statistically indistinguishable nodes at each iteration.
- Clojure
- Apache Commons Math
- JBLAS (requires ATLAS or BLAS/LAPACK)
- OpenCSV
The dependencies are configured in the pom.xml file.
- Extract the archive
data.gzwith training data by runningtar -xvzf data.gz - Optional: rebuild
decision-stream.jarwith Leiningen (lein uberjar) or Maven (mvn package).
java -jar decision-stream.jar base-directory train-data train-answers test-data test-answers learning_mode significance-thresholdThe program takes 7 input parameters:
base-directory- path to the dataset
train-data- file with training data
train-answers- file with training answers
test-data- file with test data
test-answers- file with test answers
learning_mode:classificationorregression- classification or regression problem
significance-threshold- threshold for merging/splitting operations
Example:
java -jar decision-stream.jar data/ailerons/ train_data.csv train_answ.csv test_data.csv test_answ.csv regression 0.02
The datasets prepared for training in the data folder:
