-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' of https://github.com/jiminAn/Kpop_NLP_Project
- Loading branch information
Showing
20 changed files
with
5,762 additions
and
63 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# 데이터전처리 | ||
#### kpop_전처리.ipynb - 불용어처리,단어 토큰화 | ||
#### LDA_preprocessing.tsv - 최종 tsv | ||
|
||
|
||
# LDA | ||
#### kpop_LDA.ipynb - 명사만 추출 후 LDA 모델 적용 | ||
#### kpop_LDA_2.ipynb - 동사 형용사 추가 | ||
|
||
|
||
# prototype_ver2 | ||
#### LDA_thema.tsv - 최종 tsv에 분위기 추가한 버전 | ||
#### prototype_ver2.py - 프로토타입 | ||
#### lda_dict - 가사 dictionary 변환 | ||
#### ----모델 학습 저장한 파일--- | ||
#### model.h5 | ||
#### model.h5.expElogbeta.npy | ||
#### model.h5.id2word | ||
#### model.h5.state |
Large diffs are not rendered by default.
Oops, something went wrong.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11199,4 +11199,4 @@ | |
] | ||
} | ||
] | ||
} | ||
} |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# LOG | ||
----------------- | ||
## 1. [song crawling](https://github.com/jiminAn/Kpop_NLP_Project/tree/master/%EC%9D%B4%EC%9D%80%ED%9B%84/song%20crawling) 파일: 멜론 사이트에서 아이돌 노래정보 크롤링 | ||
* `try2_create_idol_id(.ipynb)` 90년대 가수들 대상 멜론에서 부여된 id 크롤링 | ||
* `idol_list_with_id(.csv)` 위 파일 실행 후 csv형태로 저장한 파일 (각 아이돌별 id열 추가된 형태) | ||
* `94to95_idol_list(.csv)` 위 파일 중 담당년도(94,95)에 해당하는 가수들만 따로 저장한 파일 | ||
* `melon_crawling20(.ipynb)` 담당년도 가수당 곡 인기순 정렬 후 20개씩 곡정보 크롤링 | ||
* `94_95_kpop_final(.tsv)` 위 파일 실행 후 tsv형태로 저장한 파일 (최종파일) | ||
|
||
|
||
## 2. [modeling](https://github.com/jiminAn/Kpop_NLP_Project/tree/master/%EC%9D%B4%EC%9D%80%ED%9B%84/modeling) 파일: 노래가사 대상으로 LDA모델, K-means 적용 | ||
* `preprocess(.ipynb)` 알고리즘 적용 위해 필요한 형태로 데이터 전처리 | ||
* **[LDA] 파일** | ||
* `LDA modeling_(.ipynb)` (추가 전처리 후) 노래가사(한글+영어) LDA모델 적용하여 테마별 시각화 | ||
* **[K-means] 파일** | ||
* `k-means(.ipynb)` 노래가사(한글+영어) k-means알고리즘 적용하여 군집별로 클러스터링 | ||
|
||
|
||
## 3. [Model training](https://github.com/jiminAn/Kpop_NLP_Project/tree/master/%EC%9D%B4%EC%9D%80%ED%9B%84/Model%20training) 파일: 노래 테마분류 정확도 위해 모델 학습 | ||
* `Model_training_try13(.ipynb)` hyper-parameter 13개로 조정 후 테마 지정 | ||
* `num=13(.tsv)` 위 파일 실행 후 tsv형태로 저장한 파일 (곡별로 테마 열 추가된 형태) | ||
* **[trained data] 파일** | ||
* `model_13(.h5)` 위에서 학습시킨 모델 저장한 파일 |
This file was deleted.
Oops, something went wrong.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,3 +46,4 @@ | |
44,,클레오,Cleo,3,김하나,여,호 엔터테인먼트,101397 | ||
45,,티티마,T.T.Ma,5,소이,여,뮤직 팩토리,100017 | ||
46,,오투포,O-24,3,이가혜,여,은성 기획,100121 | ||
|
File renamed without changes.
File renamed without changes.