Skip to content

Latest commit

 

History

History
19 lines (10 loc) · 860 Bytes

File metadata and controls

19 lines (10 loc) · 860 Bytes

NLP for Amazon Hulkman Jump Starter Reviews

Part 1 Load Data and Data Cleaning: Split the data into good_review and bad_review

Part 2 Tokenizing and Stemming: Use nltk package and hand-write two functions:tokenization_and_stemming(text) & tokenization_and_lemmatization(text)

Part 3 K-means clustering

Part 4 Topic Modeling - Latent Dirichlet Allocation

Part 5 Result Comparison: Compare the topics between good reviews and bad reviews


About the dataset:

The dataset related to this project is web-scraped by myself via python(beatiful) and basic HTML knowledge.

The product reviews page of hulkman is https://www.amazon.com/HULKMAN-Alpha85-Starter-20000mAh-Portable/product-reviews/B08M41FX48/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews