About this Project

Extract data from Ghibli movie database, preprocess the data, and perform sentiment analysis to predict if the movie is negative, positive, or neutral

Data Source

Extract data from Movie Database

Notebook and Article

Access the notebook of this project here and the tutorial article here

Tools

Beautiful Soup: a Python library for pulling data out of HTML and XML files.
Numpy, Pandas: tool to read and preprocess data
NLTK (Natural Language Processing Toolkit): tool to preprocess text
Scikit-learn: tool to perform sentiment analysis

Techniques

Scrape movie database with BeautifulSoup

Extract title, url, image, rank, and rating

Preprocess data

Put data into a dataframe
Convert string into numerical values
Transform categorical variables (movie categories) into binary

Preprocess text with NLTK

Remove punctuations and stopwords
Lematize words

Perform sentiment analysis with CountVectorizer

Result

Accuracy: 0.6049

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
Web scrape Ghibli Movie DB-3.ipynb		Web scrape Ghibli Movie DB-3.ipynb
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About this Project

Data Source

Notebook and Article

Tools

Techniques

Result

About

Releases

Packages

Languages

khuyentran1401/Web-scrape-Ghibli-Movie-Database

Folders and files

Latest commit

History

Repository files navigation

About this Project

Data Source

Notebook and Article

Tools

Techniques

Result

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages