Time is one of the crucial factors in real-world question answering (QA) problems. However, language models have difficulty understanding the relationships between time specifiers, such as 'after' and 'before', and numbers, since existing QA datasets do not include sufficient time expressions. To address this issue, we propose a Time-Context aware Question Answering (TCQA) framework. We suggest a Time-Context dependent Span Extraction (TCSE) task, and build a time-context dependent data generation framework for model training. Moreover, we present a metric to evaluate the time awareness of the QA model using TCSE. The TCSE task consists of a question and four sentence candidates classified as correct or incorrect based on time and context. The model is trained to extract the answer span from the sentence that is both correct in time and context. The model trained with TCQA outperforms baseline models up to 8.5 of the F1-score in the TimeQA dataset.
dataset/: this folder contains question-context template for generating TCSE dataModel/: all the running code for model training and evaluationgenerate_TCSE.py: code for generating TCSE dataset with question-context templateTC_score.py: code for calculate TC-score from output file
- Python 3.8.2
- PyTorch 1.10.2+cu113
- transformers 4.10.2
You need to complete the three parts in order.
- Generate context template
python generate_TCSE.py --loadtc False --loadq False --data train- Generate question template using context
python generate_TCSE.py --loadtc True --loadq False --data train- Generate TCSE dataset using question-context template
python generate_TCSE.py --loadtc True --loadq True --data trainBigBird
python -m Model.main model_id=nq dataset=hard cuda=0 mode=train TCSE=True k=1.0 CRL=True k_crl=0.5BERT, RoBERTa, ALBERT
python -m Model.main model_id=[bertbase or robertabase or albertbase] dataset=hard mode=eval use_bert=True max_sequence_length=512 doc_stride=256 TCSE=True k=1.0 CRL=True k_crl=1.0BigBird
python -m Model.main model_id=nq dataset=hard cuda=0 mode=eval model_path=[YOUR_MODEL]BERT, RoBERTa, ALBERT
python -m Model.main model_id=[bertbase or robertabase or albertbase] dataset=hard mode=eval use_bert=True max_sequence_length=512 doc_stride=256 model_path=[YOUR_MODEL]- Get output file by removing null classifier
python -m Model.main model_id=nq dataset=hard cuda=0 mode=eval model_path=[YOUR_MODEL] --TCAS True- Calculate TC-score
python tcscore --predict_path [OUTPUT FILE]We referred https://github.com/wenhuchen/Time-Sensitive-QA to implement the code for preprocessing TimeQA benchmark dataset.