Decision-Tree

A generic decision tree classifier, which generates, prunes and visualises a decision tree based on an unseen dataset.

Introduction

This package allows the user to build a decision tree from a previously unseen dataset. Once the tree is built the user can test the accuracy of the tree, predict the class label of an unclassified datapoint and create a labeled visualisation of the decision tree. An additional feature allows the user to split their dataset into training, test and validation sets prior to building the decision tree.

Getting started with the package

To get started with this package clone this repo:

git clone https://github.com/mwolinska/Decision-Tree

Then enter the correct directory on your machine:

cd Decision-Tree

This package uses poetry dependency manager. To install all dependencies run:

poetry install

Using the package

File structure

The file structure we propose is outlined below. This was used to generate the suggested_commands

.
├── Decision-Tree
│   └──Decision_Tree
└── Decision-Tree-Data
    └── Iris-Dataset
        └── iris.csv

Dataset format

Currently, this package only accepts datasets in csv format, where the following conditions need to be met:

data labels are the last column in string format,
feature labels are in the first row of the data
feature data is numerical The dataset can be saved anywhere as it is passed as an argument.

Available commands

The cli is triggered by using the decision-tree command, which launches the cli script. The cli has 3 available commands:

An example run using the iris dataset is outlined below.

Run command

This function takes a full dataset (in csv format), separates it into training, validation and test sets. It then generates a decision tree based on the training data. It has optional arguments as outlined below:

To create a decision tree based on the iris.csv dataset and save it as "iris_decision_tree.pickle" the following command can be run:

decision-tree run <your-csv-file> <path-to-save-your-tree>/decision_tree.pickle

To set either the prune or draw-tree variables, use one the following syntaxes:

decision-tree run <your-csv-file> -p False -d <path-to-save-your-visuals>/<desired-folder-name>/

Or:

decision-tree run <your-csv-file> --prune False --draw-tree <path-to-save-your-visuals>/<desired-folder-name>/

Once a run is completed, if the draw-tree argument was set to True the decision tree will be saved under "tree_visual.pdf" in the project directory. If the feature and label names are added to the training dataset, those are included in the tree visualisation. The tree generated using the run above would look like this:

If the prune variable is set to True the pruned tree visualisation will be saved under "pruned_tree.pdf" in the project directory. For this run it would look like this:

If the feature names are not included in the dataset the tree will be labeled using column indices as feature numbers. This image is generated using a different run than those above.

Load command

The load command allows the user to load an existing decision tree (in pickle format) and generate predictions for a dataset. The required arguments are as below:

An example run would look like this:

decision-tree load <your-pickle-file> <your-samples-csv-file> <path-to-save-your-predictions>/predictions.csv

Help command

Default command to view available command.

Using the package with docker

A docker image of the package is available here.

To download the docker image run:

 docker pull mwolinska/decision-tree:latest

To load and save data outside of the docker image it is necessary to mount a directory from your machine into the docker image. The following command runs the decision-tree run command, saves the output and generates the visuals.

docker run \
  -v $(pwd)/Decision-Tree-Data:/workdir/All-Data \
  -it mwolinska/decision-tree:latest \
  run /workdir/All-Data/Iris-Dataset/iris.csv /workdir/All-Data/Iris-Dataset/test.pickle \
  -d /workdir/All-Data/Iris-Dataset/visual/

This will result in the following files being generated:

Decision-Tree-Data
└── Iris-Dataset
    ├── iris.csv
    ├── test.pickle
    └── visual
        ├── pruned_tree
        ├── pruned_tree.pdf
        ├── unpruned_tree_visual
        └── unpruned_tree_visual.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.circleci		.circleci
Decision_Tree		Decision_Tree
Images		Images
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decision-Tree

Introduction

Getting started with the package

Using the package

File structure

Dataset format

Available commands

Run command

Load command

Help command

Using the package with docker

About

Releases 7

Packages

Languages

mwolinska/Decision-Tree

Folders and files

Latest commit

History

Repository files navigation

Decision-Tree

Introduction

Getting started with the package

Using the package

File structure

Dataset format

Available commands

Run command

Load command

Help command

Using the package with docker

About

Topics

Resources

Stars

Watchers

Forks

Releases 7

Packages 0

Languages

Packages