Skip to content

A regression system was designed that predicts the IMDb rating of a movie. A movie viewer would otherwise have to rely on a critic's review or self-instincts. A dataset, obtained from Kaggle, contains certain attributes (such as genre, duration, names of actor, director, number of voters for the rating, plot and keywords, language, etc.) and per…

Notifications You must be signed in to change notification settings

amanvora/Movie-Rating-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Libraries required:
1. Scikit-learn
2. Matplotlib
3. numpy



Instructions to run:
1. Please run preproc.py first. It will generate 'preprocessed.txt' and 'imdbScores.txt' to be used by the other two scripts.

2. modelSelCV.py and preproc.py each have commented sections in order to view plot of covariance matrix and categorical features respectively.

3. Since model selection performs an exhaustive evaluation of all the models across all the hyperparamters, it takes a long time to execute (approx. 30 minutes depending on the system)




Code referred:
1. For one-hot enconding of catgorical features:
http://fastml.com/converting-categorical-data-into-numbers-with-pandas-and-scikit-learn/

2. For splitting the data to (train,test) and loading the database
https://www.kaggle.com/nsnilay/d/deepmatrix/imdb-5000-movie-dataset/predicting-imdb-ratings-using-numerical-attributes/notebook

About

A regression system was designed that predicts the IMDb rating of a movie. A movie viewer would otherwise have to rely on a critic's review or self-instincts. A dataset, obtained from Kaggle, contains certain attributes (such as genre, duration, names of actor, director, number of voters for the rating, plot and keywords, language, etc.) and per…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages