Author: Tianze Shi
This repo contains the research code and scripts used in the paper Semantic Role Labeling as Syntactic Dependency Parsing. This README file aims at giving basic overviews of the code structure and its major components. For more questions, please directly contact the authors.
The entrance point to the package is here. Calls to this package can be chained through fire CLI. The order of calling should usually be build-vocab
, create-parser
, load-embeddings
, train
and then finally finish
.
An example inference script is here using parser.evaluate(data)
after loading in models and embeddings.
For official CoNLL evaluation script, access at https://www.cs.upc.edu/~srlconll/soft.html. The F1 scores displayed during model training are NOT official F1 scores (though they are usually very close).
The major parsing module is within the python class SRLDepParser
inside this file. Back-and-forth conversion algorithms tuned on OntoNotes 5.0 data are contained in this file.
To speed up loading time, we can process the embedding files to trim down to only the vocabulary seen in our data. Script for trimming is here.
Data preparation scripts lie under data_prep
folder.
Prerequisite: Stanford CoreNLP with English and Chinese models v3.9.2
- Follow http://cemantix.org/data/ontonotes.html to prepare data, using v12 data release
- For Chinese data (http://conll.cemantix.org/2012/data.html), copy folders under the correct splits
- Use train dev and conll-2012-test splits for English, train dev, test for Chinese
- Run
aggregate.sh
- Run
space_to_tab.sh
- Run
constituency_tree.sh
- Run
english_dep_tree.sh
andenglish_fuse.py
for English data preparation - Run
chinese_dep_tree.sh
andchinese_fuse.py
for Chinese data preparation