This repository contains code for the paper "Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers" in ACL 2023.
Since the datasets are not publicly available (e.g. PTB and SPMRL requires license), we cannot include them in the repo. We provide samples of the data format in the data folder.
You can follow the command to parse sentences. Note that it is a straight-forward implementation which processes sentences one-by-one. In the next section, we provide scripts that process batch-by-batch to make the process faster.
python perturb_parse.py --model_name bert-base-uncased --treebank_path ../data/ptb/ptb-dev.txt --pred_tree_path pred_tree
Simply replace the model_name
argument with the model in Huggingface. For example, if we are using roberta-base
, we just need to change the model_name
to roberta-base
. layer
specifies the specific layer of pre-trained model from which the representation is used. Use bert-base-multilingual-uncased
for SPMRL dataset.
- Use the following script to obtain distortion scores from the substitution perturbation.
python compute_sub.py --model_name bert-base-uncased --treebank_path ../data/ptb/ptb-dev.txt --layer 12 --score_path bert-base-ptb-dev
- Use the following script to obtain distortion scores from the decontextualization perturbation.
python compute_dc.py --model_name bert-base-uncased --treebank_path ../data/ptb/ptb-dev.txt --layer 12 --score_path bert-base-ptb-dev
- Use the following script to obtain distortion scores from the movement perturbation.
python compute_move.py --model_name bert-base-uncased --treebank_path ../data/ptb/ptb-dev.txt --layer 12 --score_path bert-base-ptb-dev
Run the following script to normalize distortion scores.
python normalize_scores.py --score_path bert-base-ptb-dev --layer 12
Run the following scripts to do decoding and evaluation using the normalized score.
python parse_eval.py --score_path ./normalized_scores/bert-base-ptb_10 --pred_tree_path pred_tree --treebank_path ../data/ptb/ptb-dev-sample.txt
If you found this work usefule, please cite
@inproceedings{li2023contextual,
title={Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers},
author={Li, Jiaxi and Lu, Wei},
booktitle={Proceedings of ACL},
year={2023}
}