Extract data from Ghibli movie database, preprocess the data, and perform sentiment analysis to predict if the movie is negative, positive, or neutral
Extract data from Movie Database
Access the notebook of this project here and the tutorial article here
- Beautiful Soup: a Python library for pulling data out of HTML and XML files.
- Numpy, Pandas: tool to read and preprocess data
- NLTK (Natural Language Processing Toolkit): tool to preprocess text
- Scikit-learn: tool to perform sentiment analysis
- Scrape movie database with BeautifulSoup
- Extract title, url, image, rank, and rating
- Preprocess data
- Put data into a dataframe
- Convert string into numerical values
- Transform categorical variables (movie categories) into binary
- Preprocess text with NLTK
- Remove punctuations and stopwords
- Lematize words
- Perform sentiment analysis with CountVectorizer
Accuracy: 0.6049