In this paper, we introduce document-level discourse probing to evaluate the ability of pretrained LMs to capture document-level relations. We experiment with 7 pretrained LMs, 4 languages, and 7 discourse probing tasks, and find BART to be overall the best model at capturing discourse — but only in its encoder.
Fajri Koto, Jey Han Lau, and Timothy Baldwin. Discourse Probing of Pretrained Language Models. In Proceedings of the 20th Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021), Mexico (virtual).
- Our code is based on Python3 and huggingface Pytorch framework. All libs is listed in
requirements.txt
. - Please run
pip install -r requirements.txt
- Similar to the next sentence prediction (NSP) objective in BERT pretraining, but here we frame it as a 4-way classification task, with one positive and 3 negative candidates for the next sentence
- Folder:
nsp_choice
. To run the code you can see the examples atrun.sh
. - Data for EN, ES, DE, and ZH, are provided at
nsp_choice/data/
.
- We shuffle 3–7 sentences and attempt to reproduce the original order.
- Folder:
ordering
. To run the code you can see the examples atrun.sh
. - Data for EN, ES, DE, and ZH, are provided at
ordering/data/
.
- Given two sentences/clauses, the task is to identify an appropriate discourse marker, such as "while", "and".
- Folder:
dissent
. To run the code you can see the examples atrun.sh
. - Data for EN and DE are provided at
dissent/data/
. There is no data for ES. - Some samples of ZH data are provided. Due to the re-distribution policy, you first need to request the CDTB to the related author.
You can extract the data by running
rst/prepare_data/extract_chinese_dtb.ipynb
- Nuclearity prediction: For a given ordered pairing of (potentially complex) EDUs which are connected by an unspecified relation, predict the nucleus/satellite status of each.
- Relation prediction: For a given ordered pairing of (potentially complex) EDUs which are connected by an unspecified relation, predict the relation that holds between them.
- Folder:
rst
. To run the code you can see the examples atrun_nuc.sh
andrun_rel.sh
. - Some samples of EN, ES, DE, and ZH are provided. For ES, DE, and ZH, you can use:
rst/prepare_data/extract_chinese_dtb.ipynb
,rst/prepare_data/extract_german_dtb.ipynb
,rst/prepare_data/extract_spanish_dtb.ipynb
to extract the data (after downloading the related Discourse Tree Bank). There is no code provided for extracting EN data.
- Chunk a concatenated sequence of EDUs into its component EDUs.
- Folder:
segment
. To run the code you can see the examples atrun.sh
. - Some samples of EN, ES, DE, and ZH are provided. For ES, DE, and ZH, you can use:
extract_chinese_dtb.ipynb
,extract_german_dtb.ipynb
,extract_spanish_dtb.ipynb
to extract the data (after downloading the related Discourse Tree Bank). There is no code provided for extracting EN data.
- Given a 4-sentence story context, pick the best ending from two possible options (Mostafazadeh et al., 2016).
- Folder:
cloze
. To run the code you can see the examples atrun.sh
. - This probing task is only for EN. First, please request the data to Mostafazadeh et al., 2016, and use
cloze/prepare_data.ipynb
to prepare the data. Some samples are provided at foldercloze/data
After running all the experiments, we provide some post-processing codes:
post_process.ipynb
: to extract mean and standard deviation of all experiments from 3 different runs.plot_across_model.ipynb
: to create Figure 2 in the paper.plot_across_langauges.ipynb
: to create Figure 3 in the paper.