Won 3rd Prize in Video Popularity Hackathon organised by Bitgrit and Mathematics and Computing Society, DTU.
- Download Jupyter Notebook(or use Google Colab)
- Download the zip files and extract in directory
- Open the provided jupyter notebook
- Add the location of the training and testing data set provided with the code
- Run to get the predictions
- Data visualisation has also been done in the code. Refer comments for that
The metadata was used as it is given. For using image pixel data, discription data and title data, I used the average of the values provided to make it easy for analysis. With visulation, the results from this approach looked satisfactory.
I didn't use 'views' and 'comp_id' for obvious reasons with the training dataset. The list of features used are:
- embed
- ratio
- duration
- language
- partner
- n_likes
- n_tags
- n_formats
- hour
- Average(average of pixel data of images)
- average_d(average of description data)
I removed a few extreme values to avoid outliers. The deleted values and the method to delete has been given in the code with comments to explain it as well.
Note: Got best results with Linear Regression. I tried XGBoost, Random Forest and SVM regression but results with Linear Regression were the most suitable.
Note: As the image dataset is too big to upload here, I have provided this link to download it.