Syllabus | Slides and Assignments | Project | Lecturer

Decision Tree Classifier

Make sure your repo is up-to-date

Assignment codes might be modified during the semester so please pull from this repo first and overwrite your repo with the DecisionTree folder.

Build your own decision tree classifier (with continuous input)

Expectation

my_DT.py should behave the same as the DecisionTreeClassifier in sklearn with the same set of inputs.

Implement my_DT.fit() function in my_DT.py

Inputs:

X: pd.DataFrame, independent variables, each value is a continuous number of float type
y: list, np.array or pd.Series, dependent variables, each value is a category of int or str type

Implement my_DT.predict() function in my_DT.py

Input:

X: pd.DataFrame, independent variables, each value is a continuous number of float type

Output:

Predicted categories of each input data point. List of str or int.

Implement my_DT.predict_proba() function in my_DT.py

Input:

X: pd.DataFrame, independent variables, each value is a continuous number of float type

Output:

Prediction probabilities of each input data point belonging to each categories. pd.DataFrame(list of prob, columns = self.classes_).

Example:

self.classes_ = {"2", "1"}
the reached node for the test data point has {"1":2, "2":1}
then the prob for that data point is {"2": 1/3, "1": 2/3}
return probs = pd.DataFrame(list of prob, columns = self.classes_)

Test my_DT decision tree classifier with A6.py

It is expected to perform the same with sklearn.tree.DecisionTreeClassifier.
Expected output:

(base) zhe@Zhe-Yus-MacBook-Pro DecisionTree % python A6.py
['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-virginica', 'Iris-virginica', 'Iris-virginica', 'Iris-virginica', 'Iris-virginica']
Iris-setosa     1.000000
Iris-setosa     1.000000
Iris-setosa     1.000000
Iris-setosa     1.000000
Iris-setosa     1.000000
Iris-versicolor 1.000000
Iris-versicolor 1.000000
Iris-versicolor 1.000000
Iris-versicolor 1.000000
Iris-versicolor 1.000000
Iris-virginica  1.000000
Iris-virginica  1.000000
Iris-virginica  1.000000
Iris-virginica  1.000000
Iris-virginica  1.000000

Do not forget to push your local changes to the Github server.

Grading Policy

importing additional packages such as sklearn is not allowed.
4 (out of 7) points will be received if A6.py successfully runs and makes predictions.
The rest 3 points will be given based on the percentage of same predictions with the correct implementation.

Hint

If my_DT.py is too difficult to implement, you can try to complete my_DT_hint.py.
my_DT_hint.py has the main functions already implemented. Students only need to complete two functions---find_best_split() and impurity().
Then, remember to rename it as my_DT.py before submitting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assignment6.md

assignment6.md

Decision Tree Classifier

Make sure your repo is up-to-date

Build your own decision tree classifier (with continuous input)

Expectation

Implement my_DT.fit() function in my_DT.py

Implement my_DT.predict() function in my_DT.py

Implement my_DT.predict_proba() function in my_DT.py

Test my_DT decision tree classifier with A6.py

Do not forget to push your local changes to the Github server.

Grading Policy

Hint

Files

assignment6.md

Latest commit

History

assignment6.md

File metadata and controls

Decision Tree Classifier

Make sure your repo is up-to-date

Build your own decision tree classifier (with continuous input)

Expectation

Implement my_DT.fit() function in my_DT.py

Implement my_DT.predict() function in my_DT.py

Implement my_DT.predict_proba() function in my_DT.py

Test my_DT decision tree classifier with A6.py

Do not forget to push your local changes to the Github server.

Grading Policy

Hint