Skip to content

BaSma99/Decision_Tree

Repository files navigation

Decision Tree assignment

Implementation steps:

1- Import Important Libraries and load the dataset.

2- Load the training dataset.

3- Get some information about the training dataset.

4- Count the numbers occurrence in the label column.

5- Split data into x_train and y_train.

6- Load the test dataset.

7- Count the numbers occurrence in label column.

8- Split the test data into x_test and y_test.

9- Train the Decision Tree classification Model.

10- Train SVM model.

11- Apply Hard and soft Voting Classifiers.

12- Apply Bagging Classifier in range[10:200].

13- Train bagging classifier with the best number of estimators.

14- Apply Random Forest classifier: (additional model code).

15- Train Boosting Classifier in range[10:200].

16- Train the boosting classifier at n estimator = 100 and with range [0.1:0.9].

17- Train Boosting classifier with n_estimators=100, learning_rate=0.3.

18- Apply XGBOOST classification model with n_estimators=100, learning_rate=0.3

Which metric is the best to compare performance, accuracy or the confusion matrix?

- Accuracy performance metrics can be critical when dealing with imbalanced data and 
  overfitting.

- The confusion matrix, recall, precision, and F1 score gives good obviousness of prediction
   results comparing with the accuracy.

- The F1-score can tell us if there is overfitting or not.

- After seeing the F1-score, we notice that there is no overfitting in the data, so the accuracy is 
  good here than the confusion matrix.

Conclusion:

  - During this assignment we learned how to apply different classification techniques, such as 
     decision tree classifier, and apply bagging and boosting with the best estimator and learning 
     rate.
  
  - After comparing the models with each other, we noticed that the decision tree model is the 
    lowest accuracy, which is 0.92, and the SVM model is the highest accuracy, which is 0.98
  
  - After applying soft voting and hard voting, the hard voting is the best, which have accuracy =
    0.94, and the soft voting is the worst, which have accuracy = 0.92.
  
  - After comparing bagging and boosting, the best accuracy is boosting classifier, which have 
     accuracy = 0.965,
  
  - When comparing the XGBOOST model(with the best number of estimator and 
    hyperparameter like the boosting classifier) with Gradient Boosting classifier, the accuracy of 
    XGBOOST is the higher which ism = 0.9668

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published