This is an implementation based on the paper Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict.
This is used for the language modeling for stance detection in the paper - Knowledge Enhanced Masked Language Model for Stance Detection.
Please see our stance detection repo 🚀
- Run the following commands.
python log_odds_ratio.py \
--filepath_corpus_i=$FP_CORPUS_I \
--filepath_corpus_j=$FP_CORPUS_J \
--filepath_background_corpus=$BACKGROUND_CORPUS
- Among generated files, check out the
z_scores.txt
containing words sorted by Z-score. The top words more likely belong to corpusI
while the botton words likely belong to corpusJ
, with respect to the background corpus.