Spam-Classification

Spam Detection Classification Model

Instead of choosing a model for this problem I have gone ahead and implemented 9 models(some same models but with different hyperparameters).

A KFold of 10 has been chosen. Even this can be increased but doing so doesn't give a significant increase to the accuracy and only increases the run time.

A table with the False Positive and False Negatives and Accuracy for each iteration in the Kfold of a Classifier is generated by the code. The last row in the table shows the average accuracy across the 10 folds. There are 9 such tables, one for each Classifier.

At the end the Classifier which gives the maximum accuracy is specified along with its accuracy.

Results

It can be inferred from the table that the Random Forest Classifier gives the best accuracy of around 95.7% followed by Support Vector Classifier(SVC).Tweaking the hyperparameters of the Random Forest Classifier changes the accuracy by around 0.2%. Moreover since the dataset is randomly shuffled at the beginning to jumble the rows depicting spam and not spam, each run of the code might result in a slightly different max accuracy.(Difference of +-0.2%, this can be inferred from multiple execution of the code).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
Spam Detection.ipynb		Spam Detection.ipynb
spambase.DOCUMENTATION		spambase.DOCUMENTATION
spambase.data		spambase.data
spambase.names		spambase.names

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spam-Classification

About

Uh oh!

Releases

Packages

Languages

yash2396/Spam-Classification

Folders and files

Latest commit

History

Repository files navigation

Spam-Classification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages