Dataset - Here
-
Based on dataset.
a. First used Raw data.
b. Pre-processed all dataset.(Explained Below) -
Based on Vocabulary
a. Used simple Stopword to create TfidfVectorizer.
b. Used cutome Vocabulary created using unigrams and bigrams from tweet data.
c. Used Twitter Glove Embedding.
- Random Forest Regressor.
- SVM.
- Simple MLP using keras(run for 100 epochs).
Following are the best result obtain in test set for each approach.
-
Without Pre-processing.
a. simple Stopword.Best result - Random Forest metrics Value 0 Pearsonr 0.247100 1 Spearmanr 0.222550 2 Pearsonr >= 0.5 0.136760 3 Spearmanr >= 0.5 0.109948
b. cutome Vocabulary.
Best result - Random Forest metrics Value 0 Pearsonr 0.536161 1 Spearmanr 0.504543 2 Pearsonr >= 0.5 0.400112 3 Spearmanr >= 0.5 0.363594
c. Twitter Glove Embedding.
Best result - SVM metrics Value 0 Pearsonr 0.586624 1 Spearmanr 0.580586 2 Pearsonr >= 0.5 0.429999 3 Spearmanr >= 0.5 0.424292
-
With Pre-processing a. simple Stopword.
Best result - Random Forest metrics Value 0 Pearsonr 0.243917 1 Spearmanr 0.217084 2 Pearsonr >= 0.5 0.128100 3 Spearmanr >= 0.5 0.110319 ``` b. cutome Vocabulary.
Best result - Random Forest metrics Value 0 Pearsonr 0.535899 1 Spearmanr 0.504324 2 Pearsonr >= 0.5 0.399039 3 Spearmanr >= 0.5 0.360943 ```
c. Twitter Glove Embedding.
Best result - SVM metrics Value 0 Pearsonr 0.587715 1 Spearmanr 0.581532 2 Pearsonr >= 0.5 0.432472 3 Spearmanr >= 0.5 0.426328 ```
- Without Pre-processing.
a. simple Stopword.
Best result - Random Forest metrics Value 0 Pearsonr 0.312417 1 Spearmanr 0.309787 2 Pearsonr >= 0.5 0.154492 3 Spearmanr >= 0.5 0.160097 ``` b. cutome Vocabulary.
c. Twitter Glove Embedding.Best result - SVM metrics Value 0 Pearsonr 0.474524 1 Spearmanr 0.494809 2 Pearsonr >= 0.5 0.308389 3 Spearmanr >= 0.5 0.307756 ```
Best result - SVM metrics Value 0 Pearsonr 0.606882 1 Spearmanr 0.604149 2 Pearsonr >= 0.5 0.449256 3 Spearmanr >= 0.5 0.442449 ```
- With Pre-processing
a. simple Stopword.
Best result - Random Forest metrics Value 0 Pearsonr 0.257611 1 Spearmanr 0.256136 2 Pearsonr >= 0.5 0.157086 3 Spearmanr >= 0.5 0.154409 ``` b. cutome Vocabulary.
c. Twitter Glove Embedding.Best result - SVM metrics Value 0 Pearsonr 0.482738 1 Spearmanr 0.499942 2 Pearsonr >= 0.5 0.310332 3 Spearmanr >= 0.5 0.323588 ```
Best result - SVM metrics Value 0 Pearsonr 0.579386 1 Spearmanr 0.576765 2 Pearsonr >= 0.5 0.382190 3 Spearmanr >= 0.5 0.382472 ```