Skip to content

Balakishan77/Spam-Email-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Spam-Email-Classification

Analyzing the content of an Email dataset which contains above 5000 email sample with labeled spam or not.We have built a model to classify given email Spam((junk email) or ham (good email) using Naive Bayes Classification algorithm with accuracy score of ~99 . #Naive Bayes Classifier Introduction Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of independence between every pair of features #Checking the distribution of data.

with lier

we can see some extreme outliers, we'll set a threshold for length of text (here threshold is 10000, I have not applied this threshold in algotithm implementaion) and plot the histogram again

with outlier

Below are metrics about the results:

#Confusion Matrix

image

We achieved 98.836899942163114% accuracy(Mean) with 0.4% standard variance. We are in low bias and low variance region, below plot of the Learning curve.

learning curve

About

Email Classification using Naive Bayes algorithm

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages