This repository contains the skeleton code and dataset files that you need in order to complete the coursework.
The data/ directory contains the datasets you need for the coursework.
The primary datasets are:
train_full.txttrain_sub.txttrain_noisy.txtvalidation.txt
Some simpler datasets that you may use to help you with implementation or debugging:
toy.txtsimple1.txtsimple2.txt
The official test set is test.txt. Please use this dataset sparingly and
purely to report the results of evaluation. Do not use this to optimise your
classifier (use validation.txt for this instead).
-
classification.py- Contains the skeleton code for the
DecisionTreeClassifierclass. Your task is to implement thetrain(),predict()andprune()methods.
- Contains the skeleton code for the
-
improvement.py- Contains the skeleton code for the
train_and_predict()function (Task 4.2). Complete this function as an interface to your new/improved decision tree classifier.
- Contains the skeleton code for the
-
example_main.py- Contains an example of how the evaluation script on LabTS might use the classes
and invoke the methods/functions defined in
classification.pyandimprovement.py.
- Contains an example of how the evaluation script on LabTS might use the classes
and invoke the methods/functions defined in
< Insert your own instructions here >