This project aims to apply topic modeling to help us understand large text content. The text is a database recording people's happy moments which extends to tons of things that makes people feel happy. The topic comparisons are from age groups, gender, parenthood and marital after precleaning the text and removing stopwords, so that we could observe more information for different groups of people. Topic modeling is a great tool in this project and could extend more for research topics such as single mom's issues, divorced problems and aging phenomenon. The final report concludes what we have explored and learned from people's happy moments along with different categories.
HappyDB is a corpus of 100,000 crowd-sourced happy moments availible in HappyDB. The goal of the corpus is to advance the state of the art of understanding the causes of happiness that can be gleaned from text. Each interviewer is asked 3 questions regarding the happyness moment he or she has during the past month. Some examples are as follows:
Achievements:
- I made a new recipe for peasant bread, and it came out spectacular!
- I was shorting Gold and made $200 from the trade.
- Managed to get the final trophy in a game I was playing.
Affections:
- I went on a successful date with someone I felt sympathy and connection with.
- I was happy when my son got 90% marks in his examination
- I went with grandchildren to butterfly display at Crohn Conservatory
Following suggestions by RICH FITZJOHN (@richfitz). This folder is orgarnized as follows.
proj/
├── lib/
├── data/
├── doc/
├── figs/
└── output/
Please see each subfolder for a README file.
