Discourse-Enhanced Transformers for Propaganda Detection

This repo contains code for the paper: Unleashing the Power of Discourse-Enhanced Transformers for Propaganda Detection

Code structure

Discourse Analysis.ipynb contains data preparation, including feature construction and analysis of the correlation between discourse and propaganda classes.
Deberta Token Classification.ipynb presents steps for the span classification task (paragraph classification) for the DeBERTa model: data preparation; model training; inference and error analysis. Relevant code is placed in glue_deberta.
Deberta Token Classification.ipynb presents steps for the token classification task (NER) for the DeBERTa model. Relevant code is placed in ner_deberta_multi
XLM-RoBERTa Span Classification.ipynb presents steps for the span classification task for the xlm-RoBERTa model. Relevant code is placed in glue_xlmroberta

Main code in folders:

dataset_construction.py - SemEval-based dataset preparation. Adds linguistic features as inputs.
modeling_...py - model architecture modification (see Fig. Architecture). The main idea is the concatenation of linguistic features and Transformer-based embeddings.
...configuration...py - extended model configuration using new parameters (e.g., extra_feature_size)
run_...py - run a training cycle for the modified architecture with possible class weights in the loss function.
inference.py - inference of the trained model

Model Architecture The model architecture is developed for the two tasks: (a) token classification; (b) span classification (paragraph-level). The trainable blocks of the model are indicated in a blue color.

Data

Data can be downloaded from the official competition website: SemEval2023 Task3 full_parsed_result_train_matched.pkl contains an example of dataset_construction stage output.

Citation

If you find this repository helpful, feel free to cite our publication:

@inproceedings{chernyavskiy-etal-2024-unleashing,
    title = "Unleashing the Power of Discourse-Enhanced Transformers for Propaganda Detection",
    author = "Chernyavskiy, Alexander  and
      Ilvovsky, Dmitry  and
      Nakov, Preslav",
    editor = "Graham, Yvette  and
      Purver, Matthew",
    booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = mar,
    year = "2024",
    address = "St. Julian{'}s, Malta",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.eacl-long.87",
    pages = "1452--1462",
    abstract = "The prevalence of information manipulation online has created a need for propaganda detection systems. Such systems have typically focused on the surface words, ignoring the linguistic structure. Here we aim to bridge this gap. In particular, we present the first attempt at using discourse analysis for the task. We consider both paragraph-level and token-level classification and we propose a discourse-aware Transformer architecture. Our experiments on English and Russian demonstrate sizeable performance gains compared to a number of baselines. Moreover, our ablation study emphasizes the importance of specific types of discourse features, and our in-depth analysis reveals a strong correlation between propaganda instances and discourse spans.",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Discourse-Enhanced Transformers for Propaganda Detection

Code structure

Data

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
glue_deberta		glue_deberta
glue_xlmroberta		glue_xlmroberta
ner_deberta_multi		ner_deberta_multi
Deberta Span Classification.ipynb		Deberta Span Classification.ipynb
Deberta Token Classification.ipynb		Deberta Token Classification.ipynb
Discourse Analysis.ipynb		Discourse Analysis.ipynb
LICENSE		LICENSE
README.md		README.md
XLM-RoBERTa Span Classification.ipynb		XLM-RoBERTa Span Classification.ipynb
architecture.png		architecture.png
full_parsed_result_train_matched.pkl		full_parsed_result_train_matched.pkl

License

alchernyavskiy/discourse_propagada_detection

Folders and files

Latest commit

History

Repository files navigation

Discourse-Enhanced Transformers for Propaganda Detection

Code structure

Data

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages