One-genre Gender Detection (Russian)
Work | Publishing Year | Corpus | Features | Method used | Result |
---|---|---|---|---|---|
RuSb_base+stylometry | PAN FIRE ’17 [d8] | Stylometric features | Russsian SBERT | 0.90 | |
RuSb_base | PAN FIRE ’17 [d8] | Russsian SBERT | 0.87 | ||
Korshunov [r6] | 2013 | self-made | word (3-grams) | SVM | 0.86 |
Sboev et al [r4] | 2019 | RusProfilihg [d11] PAN FIRE ’17 [d8] |
char n-grams | Gradient Boosting | 0.79 |
Сбоев и др. [r1] | 2020 | LiveJournal [d10] | GRU, CVAE | 0.76 | |
Markov et al - CIC3 [r2] | 2017 | PAN FIRE ’17 [d8] | statistical | 0.6825 | |
LDR [d12] | 2017 | PAN FIRE ’17 [d8] | probability distribution of occurrence of tdoc’s words in the different classes. | 0.6759 | |
Markov et al - CIC2 [r2] | 2017 | PAN FIRE ’17 [d8] | BOW, word (suffix 3-grams), tf-idf | SVM | 0.6650 |
Bhargava et al [r3] | 2017 | PAN FIRE ’17 [d8] | POS, rule-based classification | LSTM, Bi-LSTM | 0.6525 |
Markov et al - CIC1 [r2] | 2017 | PAN FIRE ’17 [d8] | POS combination, tf-idf | SVM | 0.6525 |
Cross-genre Gender Detection (Russian)
Work | Publishing Year | Corpus | Features | Method used | Test Corpus | Result |
---|---|---|---|---|---|---|
RuSb_base+stylometry | PAN FIRE ’17 [d8] (Twitter - training) |
Stylometric features | Russsian SBERT | Essays | 0.87 | |
RuSb_base | PAN FIRE ’17 [d8] (Twitter - training) |
Russsian SBERT | Essays | 0.86 | ||
RuSb_big | RusProfilihg [d11] PAN FIRE ’17 [d8] |
Russsian SBERT | Essays | 0.86 | ||
RuSb_big+stylometry | RusProfilihg [d11] PAN FIRE ’17 [d8] |
Stylometric features | Russsian SBERT | Essays | 0.86 | |
LDR [d12] | 2017 | PAN FIRE ’17 [d8] (Twitter - training) |
stylometric analysis | Essays | 0.8141 | |
Bhargava et al [r3] | 2017 | PAN FIRE ’17 [d8] (Twitter - training) |
POS, rule-based classification | LSTM, Bi-LSTM | Essays | 0.7838 |
Vinayan et al [r5] | 2017 | PAN FIRE ’17 [d8] (Twitter - training) |
exotic stat (average word length, URL usage, etc), tf-idf | SVM | Essays | 0.6811 |
RuSb_big+stylometry | RusProfilihg [d11] PAN FIRE ’17 [d8] |
Stylometric features | Russsian SBERT | Reviews | 0.83 | |
RuSb_big | RusProfilihg [d11] PAN FIRE ’17 [d8] |
Russsian SBERT | Reviews | 0.83 | ||
RuSb_base+stylometry | PAN FIRE ’17 [d8] (Twitter - training) |
Stylometric features | Russsian SBERT | Reviews | 0.80 | |
RuSb_base | PAN FIRE ’17 [d8] (Twitter - training) |
Russsian SBERT | Reviews | 0.80 | ||
Sboev et al [r4] | 2019 | PAN FIRE ’17 [d8] (Twitter - training) |
char n-grams | Gradient Boosting | Reviews | 0.79 |
LDR [d12] | 2017 | PAN FIRE ’17 [d8] (Twitter - training) |
stylometric analysis | Reviews | 0.72 | |
Markov et al - CIC3 [r2] | 2017 | PAN FIRE ’17 [d8] (Twitter - training) |
statistical | Reviews | 0.6186 | |
Markov et al - CIC1 [r2] | 2017 | PAN FIRE ’17 [d8] (Twitter - training) |
POS combination, tf-idf | SVM | Reviews | 0.5979 |
Bhargava et al [r3] | 2017 | PAN FIRE ’17 [d8] (Twitter - training) |
POS, rule-based classification | LSTM, Bi-LSTM | Reviews | 0.5786 |
RuSb_big | PAN FIRE ’17 [d8] (Twitter - training) |
Russsian SBERT | Gender imitation | 0.95 | ||
RuSb_base | PAN FIRE ’17 [d8] (Twitter - training) |
Russsian SBERT | Gender imitation | 0.93 | ||
Bhargava et al [r3] | 2017 | PAN FIRE ’17 [d8] (Twitter - training) |
POS, rule-based classification | LSTM, Bi-LSTM | Gender imitation | 0.6596 |
LDR [d12] | 2017 | PAN FIRE ’17 [d8] (Twitter - training) |
stylometric analysis | Gender imitation | 0.6383 |