An end-to-end system for classifying English tweets as offensive or non-offensive, based on the OffensEval 2019 Shared Task (subtask A).
An end-to-end system for classifying Greek tweets as offensive or non-offensive, based on the OffensEval 2020 Shared Task (subtask A).
- GloVe embedding + Bidirectional LSTM -> RoBERTa-base model
- Model finetuning and hypertuning
- Removing diacritics
- Convert unicode data into ASCII characters
- Lemmatization
- XLM-RoBERTa model
- Model finetuning and hypertuning
If necessary, download and install anaconda by running the following commands:
wget https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-x86_64.sh
sh Anaconda3-2021.11-Linux-x86_64.sh
-
(not needed for
D4.cmd
) Download the best model for primary task and place the entire folder (containingconfig.json
andpytorch.bin
) inmodels/
-
Download the best model for adaptation task and place the entire folder (containing
config.json
andpytorch.bin
) inmodels/
-
Note that the model for primary task (the folder containing
config.json
andpytorch.bin
) should be namedfinetune_roberta
and the model for adaptation task should be namedfinetune_xlmr_large_final_greek
-
Both models should be accessible to anyone logged into an UW Google account.
-
Following is an example of the directory structure of the model for the adaptation task:
models/finetune_xlmr_large_final_greek
models/finetune_xlmr_large_final_greek/config.json
models/finetune_xlmr_large_final_greek/pytorch.bin
condor_submit D4.cmd
Notes:
- For the purposes of this deliverable, preprocessing and training are commented out from the main script (
D4_run.sh
). - The condor script activates an existing conda environment on patas. No need to create/update the conda environment.
In summary, the pipeline:
- Pre-processes the Offensive Greek Twitter Dataset (OGTD) training and test data.
- Finetunes pretained model (XLM-RoBERTa) on Greek training data.
- Runs finetuned model predictions on Greek data and save output predictions in
outputs/D4/adaptation/evaltest/D4_greek_preds.csv
- Saves the final f1-score in
results/D4/adaptation/evaltest/D4_scores.out