A data science project
During this project we have tried to analyze a dataset obtained from huffpost containing roughly 200'000 newspaper articles, including headline, description, category and publication date. Using the headline and/or description words we tried to predict the human labeled categories using word embeddings and common machine learning models.
This project was created in collaboration with Michael Hodel (https://github.com/michaelhodel) during the lecture Introduction to Data Science offered at University of Zürich.
The data set used can be found here: https://www.kaggle.com/rmisra/news-category-dataset.