This repository contains the code and data used in the paper "UnNatural Language Inference", to appear in ACL 2021 (Long paper).
For reproducing our experiments, follow the outlined steps. We use Pytorch Model Hub checkpoints (trained using FairSeq) to infer RoBERTa and BART models. For DistilBERT, we train the model using code derived from ANLI repository. For pre-Transformer models, we train the models using InferSent repository.
- Install Pytorch 1.7.0
conda install pytorch==1.7.0 torchvision==0.8.0 -c pytorch
- Install dependencies
pip install -r requirements.txt
pip install ray[tune]
and download the Spacy English and Chinese Tokenizer:
python -m spacy download en_core_web_md
python -m spacy download zh_core_web_lg
NB: For training, please configure Apex on your system to enable fp16 training.
In config.yaml
, first set the proper outp_path
where you will download our files.
We provide a list of models used in the paper here. There are three model types: Hub
for Pytorch Hub models, HF
for trained HuggingFace models, and RNN
for InferSent models. For Pytorch Hub models, you do not need to manually download the models as it is taken care by fairseq.
Model | Type | Size | Location |
---|---|---|---|
roberta.large.mnli | Hub | - | Pytorch Hub |
bart.large.mnli | Hub | - | Pytorch Hub |
distilbert.mnli | HF | 147M | Download |
Model | Type | Size | Location |
---|---|---|---|
roberta.large.ocnli | HF | 729M | Download |
Model | Type | Size | Location |
---|---|---|---|
infersent.mnli | RNN | 1.1G | Download |
convnet.mnli | RNN | ||
blstmprojencoder.mnli | RNN |
(Same location containing all folders and files)
Model | Type | Size | Location |
---|---|---|---|
roberta.large.me.mnli | HF | 794M | Download |
Download and build the original MNLI, SNLI, ANLI and OCNLI data following the code in ANLI, which places the data in anli/build/{data name}
. You can use the following script to prepare the datasets in one go:
cd anli
source setup.sh
bash ./anli/script/download_data.sh
python src/dataset_tools/build_data.py
We make use of Pytorch FairSeq, ANLI scripts and InferSent repository script for running evaluation on the word shuffled data and computing our metrics. Below we provide the steps to evaluate from each of these settings.
- In each of the datasets, the script first generates the word shuffled corpus and creates it at
{outp_path}/{eval_data}/rand_100_p_1.0_k_0.0_stop_False_punct_True/rand.jsonl
- For each of the below cases, after the run evaluation is complete, you can view the outputs in
{outp_path}/{eval_data}/rand_100_p_1.0_k_0.0_stop_False_punct_True/outputs
-
Models: roberta.large.mnli, bart.large.mnli
-
Run evaluation, where
eval_data
can be any one of Data names listed above. For example, to run evaluation on MNLI Matched development set, run:python main.py --eval_data mnli_m_dev --model_type hub --model_name roberta.large.mnli
Note : You can modify the
batch_size
parameter in config.yaml to your liking for faster inference. -
You can use any FairSeq and Pytorch Hub model trained on MNLI to explore similar setups.
-
Models: distilbert.mnli, roberta.large.ocnli
-
Download the appropriate model from the above table and place it in
anli/models
folder -
For DistilBERT, run the following
./scripts/distilbert_eval.sh
-
For OCNLI, run the following
./scripts/ocnli_eval.sh
-
You can use any model trained using HuggingFace Transformers as done in ANLI repository in this section following the above scripts. To add your own model, add the path to your trained model in
model_store
.
-
Models: infersent.mnli, convnet.mnli, bilstmproj.mnli
-
Download the appropriate model from the table above and extract in the root folder (you should have
rnn_models
) in the main folder -
For InferSent evaluation, we need a slightly different format of the data. For MNLI and OCNLI, we provide
run.sh
indataset_utils/mnli/
anddataset_utils/ocnli
to convert the files appropriately. These scripts requirernn_models
to exist in the main folder. -
Run evaluation, where
eval_data
can be any one of Data names listed above. For example, to run evaluation on MNLI Matched development set using InferSent model, run:python main.py --eval_data mnli_m_dev --model_type rnn_infersent --model_name infersent.mnli
-
Model types are defined in
config.yaml
-
Example script to run other models is available in scripts/transformer_eval.sh
You can use the table script to generate the aggregate table as in the paper:
python codes/tables.py --data_loc <outp_path> python --datas mnli_m_dev --models roberta.large.mnli
- For more rows in the table, provide the eval_data in comma separated way in
--datas
and provide the models in--models
in a comma separated way. Check the defaults incodes/tables.py
for more information.
You can use the plotting script to generate the plots used in the paper.
python codes/plot.py --plot entropy
To run the POS Tree Analysis experiments, run the following script, which first computes the train statistic on MNLI train data and then computes the POS Tree analysis on model outputs.
python codes/min_tree.py -k 4 --loc data/ --compute_train_stats --mp
If you have already computed the train statistics (output file in data/mnli_m_dev/min_tree_train.pkl
), omit the --compute_train_stats flag.
For reproducibility, you can check your results with our published results in the paper.
If you find our work useful in your research, please cite us using the following bibtex.
@article{sinha2021unnat,
title = {{UnNatural Language Inference}},
author = {Koustuv Sinha and Prasanna Parthasarathi and Joelle Pineau
and Adina Williams},
journal = {ACL},
year = 2021,
url = {https://arxiv.org/abs/2101.00010},
arxiv = {2101.00010},
}
- We thank the amazing ANLI repository, which we use for various prototyping and experiments.
- We also use the InferSent repository for non-Transformer experiments
https://opensource.facebook.com/legal/terms
https://opensource.facebook.com/legal/privacy
UNLU is licensed under Creative Commons-Non Commercial 4.0. See the LICENSE file for more details.