ling-573-group-repo

Task Description

Primary Task

An end-to-end system for classifying English tweets as offensive or non-offensive, based on the OffensEval 2019 Shared Task (subtask A).

Adaptation Task

An end-to-end system for classifying Greek tweets as offensive or non-offensive, based on the OffensEval 2020 Shared Task (subtask A).

Changes in D4

Primary Task

Embeddings and Classification

GloVe embedding + Bidirectional LSTM -> RoBERTa-base model
Model finetuning and hypertuning

Adaptation Task

Additional Preproccessing

Removing diacritics
Convert unicode data into ASCII characters
Lemmatization

Embeddings and Classification

XLM-RoBERTa model
Model finetuning and hypertuning

Instructions

1. Prerequisites

Install Anaconda

If necessary, download and install anaconda by running the following commands:

wget https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-x86_64.sh
sh Anaconda3-2021.11-Linux-x86_64.sh

Download best models for primary and adaptation tasks

(not needed for D4.cmd) Download the best model for primary task and place the entire folder (containing config.json and pytorch.bin) in models/
Download the best model for adaptation task and place the entire folder (containing config.json and pytorch.bin) in models/
Note that the model for primary task (the folder containing config.json and pytorch.bin) should be named finetune_roberta and the model for adaptation task should be named finetune_xlmr_large_final_greek
Both models should be accessible to anyone logged into an UW Google account.
Following is an example of the directory structure of the model for the adaptation task:

models/finetune_xlmr_large_final_greek
models/finetune_xlmr_large_final_greek/config.json
models/finetune_xlmr_large_final_greek/pytorch.bin

2. Run the Condor Script

condor_submit D4.cmd

Notes:

For the purposes of this deliverable, preprocessing and training are commented out from the main script (D4_run.sh).
The condor script activates an existing conda environment on patas. No need to create/update the conda environment.

In summary, the pipeline:

Pre-processes the Offensive Greek Twitter Dataset (OGTD) training and test data.
Finetunes pretained model (XLM-RoBERTa) on Greek training data.
Runs finetuned model predictions on Greek data and save output predictions in outputs/D4/adaptation/evaltest/D4_greek_preds.csv
Saves the final f1-score in results/D4/adaptation/evaltest/D4_scores.out

Name		Name	Last commit message	Last commit date
Latest commit History 212 Commits
configs		configs
data		data
doc		doc
experiments		experiments
lstm_saved_configs		lstm_saved_configs
models		models
outputs		outputs
results		results
src		src
D2.cmd		D2.cmd
D2_run.sh		D2_run.sh
D3.cmd		D3.cmd
D3_run.sh		D3_run.sh
D4.cmd		D4.cmd
D4_run.sh		D4_run.sh
README.md		README.md
englishtest.sh		englishtest.sh
ensemble.cmd		ensemble.cmd
ensemble.sh		ensemble.sh
env.yml		env.yml
finetune_model.ipynb		finetune_model.ipynb
finetune_pretrained_eng.sh		finetune_pretrained_eng.sh
finetune_pretrained_gr.sh		finetune_pretrained_gr.sh
finetune_pretrained_hybrid.sh		finetune_pretrained_hybrid.sh
greektest.sh		greektest.sh
install_torch_cuda.cmd		install_torch_cuda.cmd
install_torch_cuda.sh		install_torch_cuda.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ling-573-group-repo

Task Description

Primary Task

Adaptation Task

Changes in D4

Primary Task

Embeddings and Classification

Adaptation Task

Additional Preproccessing

Embeddings and Classification

Instructions

1. Prerequisites

Install Anaconda

Download best models for primary and adaptation tasks

2. Run the Condor Script

About

Releases

Packages

Contributors 4

Languages

kvah/ling-573-offensive-tweet-detection

Folders and files

Latest commit

History

Repository files navigation

ling-573-group-repo

Task Description

Primary Task

Adaptation Task

Changes in D4

Primary Task

Embeddings and Classification

Adaptation Task

Additional Preproccessing

Embeddings and Classification

Instructions

1. Prerequisites

Install Anaconda

Download best models for primary and adaptation tasks

2. Run the Condor Script

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages