In this project, we're going to build a spam filter for SMS messages using the multinomial Naive Bayes algorithm. To train the algorithm, we'll use a dataset of 5,572 SMS messages that are already classified by humans.
The dataset was put together by Tiago A. Almeida and José María Gómez Hidalgo, and it can be downloaded from the The UCI Machine Learning Repository.
Our goal is to create a spam filter that classifies new messages with an accuracy greater than 80% — so we expect that more than 80% of the new messages will be classified correctly as spam or ham (non-spam).