Skip to content

yash2396/Spam-Classification

Repository files navigation

Spam-Classification

Spam Detection Classification Model

Instead of choosing a model for this problem I have gone ahead and implemented 9 models(some same models but with different hyperparameters).

A KFold of 10 has been chosen. Even this can be increased but doing so doesn't give a significant increase to the accuracy and only increases the run time.

A table with the False Positive and False Negatives and Accuracy for each iteration in the Kfold of a Classifier is generated by the code. The last row in the table shows the average accuracy across the 10 folds. There are 9 such tables, one for each Classifier.

At the end the Classifier which gives the maximum accuracy is specified along with its accuracy.

Results

It can be inferred from the table that the Random Forest Classifier gives the best accuracy of around 95.7% followed by Support Vector Classifier(SVC).Tweaking the hyperparameters of the Random Forest Classifier changes the accuracy by around 0.2%. Moreover since the dataset is randomly shuffled at the beginning to jumble the rows depicting spam and not spam, each run of the code might result in a slightly different max accuracy.(Difference of +-0.2%, this can be inferred from multiple execution of the code).

About

Spam Detection Classification Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published