GitHub - deepnz/miniMongoDB_project: (Cosine similarity algorithm) Sentiment Analyzer of Reviews- Given a set of reviews, we developed a super simple “sentiment” analyzer for these reviews. From two collections of words (positive, negative respectively), we counted which words prevail in a review (algorithm below)and are provided and represented in JSON format The task was to categ

deepnz / miniMongoDB_project Public

forked from deepn2/miniMongoDB_project

Notifications You must be signed in to change notification settings
Fork 0
Star 0

(Cosine similarity algorithm) Sentiment Analyzer of Reviews- Given a set of reviews, we developed a super simple “sentiment” analyzer for these reviews. From two collections of words (positive, negative respectively), we counted which words prevail in a review (algorithm below)and are provided and represented in JSON format The task was to categ

0 stars 1 fork Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.DS_Store		.DS_Store
BaseConnection.class		BaseConnection.class
MongoConnection.java		MongoConnection.java
README		README
Review.java		Review.java
UnlabelReview.json		UnlabelReview.json
UnlabelReviewAfterSplitting.json		UnlabelReviewAfterSplitting.json

Repository files navigation

Sentiment of Reviews 


Algorithm used:
Set sentiment score is 0;
Scan each word A in a review R
If A is a positive word, sentiment score = sentiment score+1;
If A is a negative word, sentiment score = sentiment score -1;
After scanning all words, if sentiment score ≥ 0, the review is positive; if the sentiment < 0, the sentiment score is negative.


Datasets Description:
UnlabelReview.json: It contains review id and review contents, the review is raw data from customers, organized in paragraph.
Example: { id: “5201_1”, review: “I like the movie, it’s fantastic although I hate  the actor  James in the movie”}

UnlabelReviewAfterSpliting.json: It contains review id and review contents, each review has been split to word and word count. It is processed from UnlabelReview.json. We have omitted part of useless words, like stop words, like “is”,” are”, “do” and so on).
Example: { id: “5201_1”, review: [{word: “like”,count:2}, {word: “movie”,count:2}, {word: “actor”: 1}, {word: “fantastic”, count:1}, {word: “hate”, count: 1} ]}


Additional File:
positive words.txt: a list of positive words you may use when categorizing the review
<<<<<<< HEAD
negative words.txt: a list of negative words you may use when categorizing the review
=======
negative words.txt: a list of negative words you may use when categorizing the review
>>>>>>> 8f8bd9c6093b57384b46c1691a880e60253077dc