Skip to content

This repository was created as part of the Data Mining course of the Computer Science master’s program at TH Köln and examines the Trending YouTube Video Statistics data set from Kaggle.

Notifications You must be signed in to change notification settings

StephanHagge/data-mining

Repository files navigation

Data Mining

This repository was created as part of the Data Mining course of the Computer Science master’s program at TH Köln.

Description

This repository examines Kaggle's Trending YouTube Video Statistics data set. For this purpose, the data is analyzed and, using various given and specially derived attributes, the period of time that a video needs to go trending after publication is predicted. The data set only contains videos that have actually been trending. Further information can be found in the Business Understanding section.

The evaluation is primarily limited to the data relating to Germany. In the evaluation section there is also a comparison with selected other regions. Various algorithms are used for the predictions, whereby classifiers are in the foreground.

Structure of this repository

Requirements

Programs:

Additional Python packages:

  • numpy
  • pandas
  • scipy
  • mathplotlib
  • seaborn
  • pycountry
  • sklearn
  • xgboost

For a reproducible environment:

pip install -r requirements.txt

About

This repository was created as part of the Data Mining course of the Computer Science master’s program at TH Köln and examines the Trending YouTube Video Statistics data set from Kaggle.

Topics

Resources

Stars

Watchers

Forks

Languages