Skip to content

Minku-Koo/Comment-Sentiment-Analysis

Repository files navigation

Comment-Sentiment-Analysis

Comment Sentiment Analysis using Deep Learning

📌 Author : Minku Koo

📌 Project Period : Dec/2020 ~ Jan/2021

📌 Contact : corleone@kakao.com

📌 Main Library : tensorflow, keras, KoNLPy

📌 Keyword : "Sentiment Analysis", "Machine Learning", "Korean", "Deep Learning"

📃 Table of Contents

1. Scrapping Comment Data

  • Python Crawler : ./python-code/comment_crawling.py
  • Target Place : Naver, Daum News Comment
  • Scrapped Data : Comment, Replay, Article Date (+ Title, Content)
  • News Searching Keyword : "기독교", "불교", "천주교", "신천지", "종교"
  • Data Saved Place : Database (MariaDB)
  • Database Data to Text file - path : ./comment/raw-comment/

🔍 Scrapping Period per Religion

검색 키워드 수집 시작 기간 기준 날짜 수집 종료 기간
신천지 19.09.17 20.02.17 20.07.18
기독교 19.08.20 20.01.20 20.10.20
천주교 19.08.20 20.01.20 20.08.20
불교 19.08.20 20.01.20 20.08.20
종교 19.08.20 20.01.20 20.10.10

🔍 Scrapped Data Result

검색 키워드 이전 기간 이후 기간
Article Comment Article Comment
신천지 211 22,658 2,974 262,840
기독교 1,771 94,405 1,186 85,443
천주교 1,899 37,010 1,685 56,881
불교 833 6,465 420 7,585
종교 1,939 52,527 2,373 122,206

2. Labeling Comment Data

  • path : ./train-data/
  • Comment Human Inspection : ./train-data/comment-labeling.csv
  • Naver Movie Review Data : naver-ratings.csv
  • ( Data from Here )

3. Using KoNLPy Okt

Text Data Preprocessing

okt.pos(comment)
remove 'Josa', 'Punctuation', 'Number'
save path : ./comment/after-okt-comment/

4. Build Deep Learning Network using Keras

  • Python File Name : ./python-code/make_rnn_model.py
  • Train Data path : ./train-data/
  • Crawled Comment + Naver Movie Reivew => Transfer Learning
  • Comment text data convert to Vector (using TextVectorization)
  • Accuracy : 0.95
  • Val Accuracy : 0.83

5. Predict Sentiments Value

  1. Make json file -> dict[date][article] = [[comment list],[]]
  2. Every Comment Labeling using Deep Learning Model
  3. Update json file / dict[date][article] = [[comment list],[sentiment value list]] (path: ./comment/json-okt-comment)
  4. Calculate sentiment value per date
    • each Article sentiment : Weight Average (article comment count / date comment count)
    • each Date sentiment : using IMDb's rating system

6. RESULT (Make Graph)

📍 Average, Standard Deviation / Religion

검색 키워드 이전 기간 이후 기간
평균 표준 편차 평균 표준 편차
신천지 0.381 0.412 0.313 0.388
기독교 0.310 0.372 0.276 0.371
천주교 0.375 0.405 0.284 0.377
불교 0.356 0.392 0.272 0.369
종교 0.313 0.376 0.271 0.367

📍 Sentiment Average stick graph / Religion

(path : ./result-graph/emotion-average-stick/)

📍 Sentiment time flow graph

(path : ./result-graph/emotion-flow/)

  • Before COVID19 : green
  • After COVID19 : red
  • y axis
    • close to 1 : Positive
    • close to 0 : Negative

      천주교

종교

📍 All Comment Count per Month / Religion

(path : ./result-graph/comment-count/)

📍 WordCloud / Religion

(path : ./result-graph/word-cloud/)

✔ Before COVID19, 기독교

✔ After COVID19, 기독교

📍 Top 30 Word / Religion

(path : ./result-graph/word-cloud/)

✔ Before COVID19, 신천지

✔ After COVID19, 신천지