Syllabus |
Slides and Assignments |
Project |
Lecturer
Assignment codes might be modified during the semester so please pull from this repo first and overwrite your repo with the DecisionTree folder.
my_DT.py should behave the same as the DecisionTreeClassifier in sklearn with the same set of inputs.
Implement my_DT.fit() function in my_DT.py
Inputs:
- X: pd.DataFrame, independent variables, each value is a continuous number of float type
- y: list, np.array or pd.Series, dependent variables, each value is a category of int or str type
Implement my_DT.predict() function in my_DT.py
Input:
- X: pd.DataFrame, independent variables, each value is a continuous number of float type
Output:
- Predicted categories of each input data point. List of str or int.
Implement my_DT.predict_proba() function in my_DT.py
Input:
- X: pd.DataFrame, independent variables, each value is a continuous number of float type
Output:
- Prediction probabilities of each input data point belonging to each categories. pd.DataFrame(list of prob, columns = self.classes_).
Example:
- self.classes_ = {"2", "1"}
- the reached node for the test data point has {"1":2, "2":1}
- then the prob for that data point is {"2": 1/3, "1": 2/3}
- return probs = pd.DataFrame(list of prob, columns = self.classes_)
Test my_DT decision tree classifier with A6.py
- It is expected to perform the same with sklearn.tree.DecisionTreeClassifier.
- Expected output:
(base) zhe@Zhe-Yus-MacBook-Pro DecisionTree % python A6.py
['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor', 'Iris-virginica', 'Iris-virginica', 'Iris-virginica', 'Iris-virginica', 'Iris-virginica']
Iris-setosa 1.000000
Iris-setosa 1.000000
Iris-setosa 1.000000
Iris-setosa 1.000000
Iris-setosa 1.000000
Iris-versicolor 1.000000
Iris-versicolor 1.000000
Iris-versicolor 1.000000
Iris-versicolor 1.000000
Iris-versicolor 1.000000
Iris-virginica 1.000000
Iris-virginica 1.000000
Iris-virginica 1.000000
Iris-virginica 1.000000
Iris-virginica 1.000000
- importing additional packages such as sklearn is not allowed.
- 4 (out of 7) points will be received if A6.py successfully runs and makes predictions.
- The rest 3 points will be given based on the percentage of same predictions with the correct implementation.
- If my_DT.py is too difficult to implement, you can try to complete my_DT_hint.py.
- my_DT_hint.py has the main functions already implemented. Students only need to complete two functions---find_best_split() and impurity().
- Then, remember to rename it as my_DT.py before submitting.