Skip to content

n-hien/Amazon-Web-Scraping

Repository files navigation

Amazon Product Analysis

Web Scraping

- Libraries: request, BeautifulSoup, pandas
- Function get_data is used to get the data from each page, and append it to dataFrame
- Then, loop through all pages and save dataset in .csv file

Product Price Analysis

- Libraries:matplotlib, pandas, seaborn
- Firstly, duplicates in the "Asin" column will be removed
- Then,the datatypes of price, Rating and Number of Rating will be converted to numeric for analysis purpose
- Using dataFrame.describe() and boxplot to see the distribution of product price

-----------------
count 250.00000
mean 73.25436
std 72.46441
min 6.79000
25% 29.99000
50% 41.19000
75% 92.49000
max 619.80000
Name: Price, dtype: float64
-----------------

- Filter the top 20 products based on Rating and draw a scatter plot of these top 20 products

- We maybe want to buy the product with Asin-B09J8HTDHX

Product Review Analysis

- Libraries: request, BeautifulSoup, pandas, nltk
- Function get_review is used to get all reviews of products

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published