Part 1 Load Data and Data Cleaning: Split the data into good_review and bad_review
Part 2 Tokenizing and Stemming: Use nltk package and hand-write two functions:tokenization_and_stemming(text) & tokenization_and_lemmatization(text)
Part 3 K-means clustering
Part 4 Topic Modeling - Latent Dirichlet Allocation
Part 5 Result Comparison: Compare the topics between good reviews and bad reviews
About the dataset:
The dataset related to this project is web-scraped by myself via python(beatiful) and basic HTML knowledge.
The product reviews page of hulkman is https://www.amazon.com/HULKMAN-Alpha85-Starter-20000mAh-Portable/product-reviews/B08M41FX48/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews