- Function get_data is used to get the data from each page, and append it to dataFrame
- Then, loop through all pages and save dataset in .csv file - Libraries:matplotlib, pandas, seaborn
- Firstly, duplicates in the "Asin" column will be removed
- Then,the datatypes of price, Rating and Number of Rating will be converted to numeric for analysis purpose
- Using dataFrame.describe() and boxplot to see the distribution of product price
-----------------
count 250.00000
mean 73.25436
std 72.46441
min 6.79000
25% 29.99000
50% 41.19000
75% 92.49000
max 619.80000
Name: Price, dtype: float64
-----------------
- Filter the top 20 products based on Rating and draw a scatter plot of these top 20 products
- We maybe want to buy the product with Asin-B09J8HTDHX
- Libraries: request, BeautifulSoup, pandas, nltkcount 250.00000
mean 73.25436
std 72.46441
min 6.79000
25% 29.99000
50% 41.19000
75% 92.49000
max 619.80000
Name: Price, dtype: float64
-----------------
- Function get_review is used to get all reviews of products


