The goal is to build a Machine Learning model that can classify a given news headline as real or fake. To achieve the task, we will be using a few popular news datasets as well as scraping data from sites for fake news(if the need arises). The first step to solving the problem is the creation of a dataset containing headlines and their respective class labels
- Fake and real news dataset: This is a collection of both fake and real news articles with features like title, text,subject and date.
Files: ./data/sources/Fake (2).csv
and ./data/sources/True.csv
- Getting Real about Fake News: This dataset is only a first step in understanding and tackling this problem. It contains text and metadata scraped from 244 websites tagged as "bullshit" by the BS Detector Chrome Extension by Daniel Sieradski. This is a combination of fake news and conspiracy theories (which by default are still fake).
Files: ./data/sources/fake.csv
- Fake News: A binary classification dataset for both fake and real news articles.
Files: ./data/sources/fake_or_real_news.csv
- Source based Fake News Classification: A binary classification dataset for both fake and real news posts from social media. In an era where fake WhatsApp forwards and Tweets are capable of influencing naive minds, tools and knowledge have to be put to practical use in not only mitigating the spread of misinformation but also to inform people about the type of news they consume.
Files: ./data/sources/news_articles.csv
- AG News Classification Dataset: AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining. The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus.
Files: ./data/sources/News Classification train.csv
and ./data/sources/News Classification test.csv
The final dataset created for our prupose of News Classification is saved in ./data/TARP_Project_Final_Dataset.zip
. The dataset thus created was an approximately balanced one with very few null values.active
As of now we do not have an application but an example flask application
it is based on the modular structure to serve as an example of the same..
It can be launched by first init the VirtualEnv and then running the run.py in the home folder
for linux/mac
source ./env/bin/activate
For Windows with Python 3.7
pip install -r requirements.txt
set FLASK_APP=run.py
set FLAKS_DEBUG=1
flask run
The App folder contains all the information regarding the server