Skip to content

khuyentran1401/Web-scrape-Ghibli-Movie-Database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

About this Project

Extract data from Ghibli movie database, preprocess the data, and perform sentiment analysis to predict if the movie is negative, positive, or neutral

image

Data Source

Extract data from Movie Database

Notebook and Article

Access the notebook of this project here and the tutorial article here

Tools

Techniques

  1. Scrape movie database with BeautifulSoup
  • Extract title, url, image, rank, and rating
  1. Preprocess data
  • Put data into a dataframe
  • Convert string into numerical values
  • Transform categorical variables (movie categories) into binary
  1. Preprocess text with NLTK
  • Remove punctuations and stopwords
  • Lematize words
  1. Perform sentiment analysis with CountVectorizer

Result

Accuracy: 0.6049

Releases

No releases published

Packages

No packages published