📌 Author : Minku Koo
📌 Project Period : Dec/2020 ~ Jan/2021
📌 Contact : corleone@kakao.com
📌 Main Library : tensorflow, keras, KoNLPy
📌 Keyword : "Sentiment Analysis", "Machine Learning", "Korean", "Deep Learning"
- Introduction
- Data Scrapping
- Data Labeling
- Data Preprocessing
- Build Deep Learning Network
- Predict Data Sentiments
- Result
- Python Crawler : ./python-code/comment_crawling.py
- Target Place : Naver, Daum News Comment
- Scrapped Data : Comment, Replay, Article Date (+ Title, Content)
- News Searching Keyword : "기독교", "불교", "천주교", "신천지", "종교"
- Data Saved Place : Database (MariaDB)
- Database Data to Text file - path : ./comment/raw-comment/
검색 키워드 | 수집 시작 기간 | 기준 날짜 | 수집 종료 기간 |
---|---|---|---|
신천지 | 19.09.17 | 20.02.17 | 20.07.18 |
기독교 | 19.08.20 | 20.01.20 | 20.10.20 |
천주교 | 19.08.20 | 20.01.20 | 20.08.20 |
불교 | 19.08.20 | 20.01.20 | 20.08.20 |
종교 | 19.08.20 | 20.01.20 | 20.10.10 |
검색 키워드 | 이전 기간 | 이후 기간 | ||
---|---|---|---|---|
Article | Comment | Article | Comment | |
신천지 | 211 | 22,658 | 2,974 | 262,840 |
기독교 | 1,771 | 94,405 | 1,186 | 85,443 |
천주교 | 1,899 | 37,010 | 1,685 | 56,881 |
불교 | 833 | 6,465 | 420 | 7,585 |
종교 | 1,939 | 52,527 | 2,373 | 122,206 |
- path : ./train-data/
- Comment Human Inspection : ./train-data/comment-labeling.csv
- Naver Movie Review Data : naver-ratings.csv
- ( Data from Here )
okt.pos(comment)
remove 'Josa', 'Punctuation', 'Number'
save path : ./comment/after-okt-comment/
- Python File Name : ./python-code/make_rnn_model.py
- Train Data path : ./train-data/
- Crawled Comment + Naver Movie Reivew => Transfer Learning
- Comment text data convert to Vector (using TextVectorization)
- Accuracy : 0.95
- Val Accuracy : 0.83
- Make json file -> dict[date][article] = [[comment list],[]]
- Every Comment Labeling using Deep Learning Model
- Update json file / dict[date][article] = [[comment list],[sentiment value list]] (path: ./comment/json-okt-comment)
- Calculate sentiment value per date
- each Article sentiment : Weight Average (article comment count / date comment count)
- each Date sentiment : using IMDb's rating system
검색 키워드 | 이전 기간 | 이후 기간 | ||
---|---|---|---|---|
평균 | 표준 편차 | 평균 | 표준 편차 | |
신천지 | 0.381 | 0.412 | 0.313 | 0.388 |
기독교 | 0.310 | 0.372 | 0.276 | 0.371 |
천주교 | 0.375 | 0.405 | 0.284 | 0.377 |
불교 | 0.356 | 0.392 | 0.272 | 0.369 |
종교 | 0.313 | 0.376 | 0.271 | 0.367 |
(path : ./result-graph/emotion-average-stick/)
(path : ./result-graph/emotion-flow/)
(path : ./result-graph/comment-count/)
(path : ./result-graph/word-cloud/)
✔ Before COVID19, 기독교
✔ After COVID19, 기독교
(path : ./result-graph/word-cloud/)