Install the python dependencies inside a virtual env
cd sdg_classification
virtualenv venv
source venv/bin/activate
pip3 install -r requirements.txt
python3 "$PROJECT_DIR/src/multi_label_sdg.py" --multi_label_finetuning --dataset=synthetic --do_train
python3 "$PROJECT_DIR/src/multi_label_sdg.py" --label_desc_finetuning --dataset=synthetic --do_train
python3 "$PROJECT_DIR/src/multi_label_sdg.py" --label_desc_finetuning --multi_label_finetuning --dataset=synthetic --do_train
Synthetic dataset is available at data/synthetic_data/synthetic_final.tsv
To train the model on Out-if-Domain (OOD) Knowledge Hub Dataset,
python3 "$PROJECT_DIR/src/multi_label_sdg.py" --label_desc_finetuning --multi_label_finetuning --dataset=knowledge_hub --do_train
To perform evaluation on the manually annotated multi-label scientific SDG dataset,
python3 "$PROJECT_DIR/src/multi_label_sdg.py" --multi_label_finetuning --dataset=synthetic --do_train --do_in_domain_eval
To perform evaluation on the synthetic SDG dataset,
python3 "$PROJECT_DIR/src/multi_label_sdg.py" --multi_label_finetuning --dataset=synthetic --do_train --do_synthetic_eval
The source code for SBERT fine-tuning and linear classification is largely inspired from SetFit
Manually annotated dataset of papers from Open Research Online (ORO) is available at data/manually_annotated_oro/oro_gold_dataset.txt (final version)
The source code for the demo page, CORE Labs is available here -
https://github.com/oacore/about