GitHub

Decision Tree assignment

Implementation steps:

1- Import Important Libraries and load the dataset.

2- Load the training dataset.

3- Get some information about the training dataset.

4- Count the numbers occurrence in the label column.

5- Split data into x_train and y_train.

6- Load the test dataset.

7- Count the numbers occurrence in label column.

8- Split the test data into x_test and y_test.

9- Train the Decision Tree classification Model.

10- Train SVM model.

11- Apply Hard and soft Voting Classifiers.

12- Apply Bagging Classifier in range[10:200].

13- Train bagging classifier with the best number of estimators.

14- Apply Random Forest classifier: (additional model code).

15- Train Boosting Classifier in range[10:200].

16- Train the boosting classifier at n estimator = 100 and with range [0.1:0.9].

17- Train Boosting classifier with n_estimators=100, learning_rate=0.3.

18- Apply XGBOOST classification model with n_estimators=100, learning_rate=0.3

Which metric is the best to compare performance, accuracy or the confusion matrix?

- Accuracy performance metrics can be critical when dealing with imbalanced data and 
  overfitting.

- The confusion matrix, recall, precision, and F1 score gives good obviousness of prediction
   results comparing with the accuracy.

- The F1-score can tell us if there is overfitting or not.

- After seeing the F1-score, we notice that there is no overfitting in the data, so the accuracy is 
  good here than the confusion matrix.

Conclusion:

  - During this assignment we learned how to apply different classification techniques, such as 
     decision tree classifier, and apply bagging and boosting with the best estimator and learning 
     rate.
  
  - After comparing the models with each other, we noticed that the decision tree model is the 
    lowest accuracy, which is 0.92, and the SVM model is the highest accuracy, which is 0.98
  
  - After applying soft voting and hard voting, the hard voting is the best, which have accuracy =
    0.94, and the soft voting is the worst, which have accuracy = 0.92.
  
  - After comparing bagging and boosting, the best accuracy is boosting classifier, which have 
     accuracy = 0.965,
  
  - When comparing the XGBOOST model(with the best number of estimator and 
    hyperparameter like the boosting classifier) with Gradient Boosting classifier, the accuracy of 
    XGBOOST is the higher which ism = 0.9668

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Assignment_4.pdf		Assignment_4.pdf
Group_8_Assignment_4_Report.pdf		Group_8_Assignment_4_Report.pdf
Group_8_Machine_Learning_Assignment_4 (1).ipynb		Group_8_Machine_Learning_Assignment_4 (1).ipynb
README.md		README.md
group_8_machine_learning_assignment_4.py		group_8_machine_learning_assignment_4.py
pendigits-tes.csv		pendigits-tes.csv
pendigits-tra.csv		pendigits-tra.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

BaSma99/Decision_Tree

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages