Newspaper article categorization

A data science project

During this project we have tried to analyze a dataset obtained from huffpost containing roughly 200'000 newspaper articles, including headline, description, category and publication date. Using the headline and/or description words we tried to predict the human labeled categories using word embeddings and common machine learning models.

This project was created in collaboration with Michael Hodel (https://github.com/michaelhodel) during the lecture Introduction to Data Science offered at University of Zürich.

The data set used can be found here: https://www.kaggle.com/rmisra/news-category-dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
1_exploratory_data_analysis.ipynb		1_exploratory_data_analysis.ipynb
2_naive_classifier.ipynb		2_naive_classifier.ipynb
3_word_embeddings.ipynb		3_word_embeddings.ipynb
4_similarity_graphs.ipynb		4_similarity_graphs.ipynb
5_predicting_categories.ipynb		5_predicting_categories.ipynb
README.md		README.md
presentation.pdf		presentation.pdf
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Newspaper article categorization

About

Uh oh!

Releases

Packages

Languages

andrinr/uzh-data-science

Folders and files

Latest commit

History

Repository files navigation

Newspaper article categorization

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages