Code and datasets of our paper "Enhancing Aspect Term Extraction with Soft Prototypes" accepted by EMNLP 2020.
- python 3.6.7
- pytorch 1.5.0
- pytorch-pretrained-bert 0.4.0
- numpy 1.19.1
We incorporate the training and evaluation of SoftProto in train_softproto.py. Just run it as below. The --lm
argument is used to specify the type of pre-trained language models.
CUDA_VISIBLE_DEVICES=0 python train_softproto.py --dataset res14 --lm external --seed 123
Or you can run the shell script to get all results of a certain dataset.
sh res14_run.sh
Since we randomly split the training/validation datasets in the running process, the experimental results could vary on different machines. But if you run the shell script and collect the results of all settings, you'll find that the improvements brought by SoftProto are rather stable.
We re-run the experiments on another machine which is different from the one used in our paper, and list the results as below. The corresponding log files are contained in the ./log/
folder.
A separate set consists of the following files:
- sentence.txt contains the tokenized review sentences.
- target.txt contains the aspect term tag sequences. 0=O, 1=B, 2=I.
- internal_forward/backward_top10.txt contains the top-10 oracle words in SoftProtoI.
- external_forward/backward_top10.txt contains the top-10 oracle words in SoftProtoE.
- bert_base_top10.txt contains the top-10 oracle words in SoftProtoB (BASE).
- bert_pt_top10.txt contains the top-10 oracle words in SoftProtoB (PT).
For generating oracle words using LM/MLM, we will release a script in a few days.
If you find our code and datasets useful, please cite our paper.
@inproceedings{chen2020softproto,
author = {Zhuang Chen and Tieyun Qian},
title = {Enhancing Aspect Term Extraction with Soft Prototypes},
booktitle = {EMNLP},
pages = {2107--2117},
year = {2020},
url = {https://www.aclweb.org/anthology/2020.emnlp-main.164/}
}
🏁