A Python package for reproducing results of fully incremental dependency and constituency parsers described in:
- On The Challenges of Fully Incremental Neural Dependency Parsing at IJCNLP-AACL 2023.
- From Partial to Strictly Incremental Constituent Parsing at EACL 2024.
- Fully Incremental Parsing based on Neural Networks.
Note: Our implementation was built from forking yzhangcs' SuPar v1.1.4 repository. The Vector Quantization module was extracted from lucidrains' vector-quantize-pytorch and Sequence Labeling encodings from Polifack's CoDeLin repositories.
- Dependency Parsing:
- Sequence Labeling (absolute, relative, PoS-based and bracketing encodings).
- Transition-based w. Arc-Eager.
- Constituency Parsing:
- Sequence Labeling (absolute and relative encodings).
- Attach-Juxtapose.
In order to reproduce our experiments, follow the installation and deployment steps of SuPar, vector-quantize-pytorch and CoDeLin repositories. Supported functionalities are training, evaluation and prediction from CoNLL-U or PTB-bracketed files. We highly suggest to run our parsers using terminal commands in order to train and generate prediction files. In the future 🙌 we'll make available SuPar methods to easily test our parsers' performance from Python terminal.
Dependency Parsing:
- Sequence labeling Dependency Parser (
SLDependencyParser
): Inherits all arguments of the main classParser
and allows the flag--codes
to specify encoding to configure the trees linearization (abs
,rel
,pos
,1p
,2p
).
Experiment: Train absolute encoding parser with mGPT as encoder and LSTM layer as decoder to predict labels.
python3 -u -m supar.cmds.dep.sl train -b -c configs/config-mgpt.ini \
-p ../results/models-dep/english-ewt/abs-mgpt-lstm/parser.pt \
--codes abs --decoder lstm \
--train ../treebanks/english-ewt/train.conllu \
--dev ../treebanks/english-ewt/dev.conllu \
--test ../treebanks/english-ewt/test.conllu
Model configuration (number and size of layers, optimization parameters, encoder selection) is specified using configuration files (see folder configs/
). We provided the main configuration used for our experiments.
- Transition-based Dependency Parser w. Arc-Eager (
ArcEagerDependencyParser
): Inherits the same arguments as the main classParser
.
Experiment: Train Arc-Eager parser using BLOOM-560M as encoder and a MLP-based decoder to predict transitions with delay --delay
) and Vector Quantization (--use_vq
).
python3 -u -m supar.cmds.dep.eager train -b -c configs/config-bloom560.ini \
-p ../results/models-dep/english-ewt/eager-bloom560-mlp/parser.pt \
--decoder=mlp --delay=1 --use_vq \
--train ../treebanks/english-ewt/train.conllu \
--dev ../treebanks/english-ewt/dev.conllu \
--test ../treebanks/english-ewt/test.conllu
This will save in folder results/models-dep/english-ewt/eager-bloom560-mlp
the following files:
parser.pt
: PyTorch trained model.metrics.pickle
: Python object with the evaluation of test set.pred.conllu
: Parser prediction of CoNLL-U test file.
Constituency Parsing
- Sequence Labeling Constituency Parser (
SLConstituencyParser
): Analogously toSLDependencyParser
, it allows the flag--codes
in order to specify the indexing to use (abs
,rel
).
python3 -u -m supar.cmds.const.sl train -b -c configs/config-mgpt.ini \
-p ../results/models-con/ptb/abs-mgpt-lstm/parser.pt \
--codes abs --decoder lstm \
--train ../treebanks/ptb-gold/train.trees \
--dev ../treebanks/ptb-gold/dev.trees \
--test ../treebanks/ptb-gold/test.trees
- Attach-Juxtapose Constituency Parser (
AttachJuxtaposeConstituencyParser
): From the original SuPar implementation, we added the delay and Vector Quantization flag:
python3 -u -m supar.cmds.const.aj train -b -c configs/config-bloom560.ini \
-p ../results/models-con/ptb/aj-bloom560-mlp/parser.pt \
--delay=2 --use_vq \
--train ../treebanks/ptb-gold/train.trees \
--dev ../treebanks/ptb-gold/dev.trees \
--test ../treebanks/ptb-gold/test.trees
Our codes provides two evaluation methods from a .pt
PyTorch:
- Via Python prompt, loading the model with
.load()
method and evaluating with.evaluate()
:
>>> Parser.load('../results/models-dep/english-ewt/abs-mgpt-lstm/paser.pt').evaluate('../data/english-ewt/test.conllu')
- Via terminal commands:
python -u -m supar.cmds.dep.sl evaluate -p --data ../data/english-ewt/test.conllu
Prediction step can be also executed from Python prompt or terminal commands to generate a CoNLL-U file:
- Python terminal with
.predict()
method:
>>> Parser.load('../results/models-dep/english-ewt/abs-mgpt-lstm/parser.pt')
.predict(data='../data/english-ewt/abs-mgpt-lstm/test.conllu',
pred='../results/models-dep/english-ewt/abs-mgpt-lstm/pred.conllu')
- Via terminal commands:
python -u -m supar.cmds.dep.sl predict -p \
--data ../data/english-ewt/test.conllu \
--pred ../results/models-dep/english-ewt/abs-mgpt-lstm/pred.conllu
This work has been funded by the European Research Council (ERC), under the Horizon Europe research and innovation programme (SALSA, grant agreement No 101100615), ERDF/MICINN-AEI (SCANNER-UDC, PID2020-113230RB-C21), Xunta de Galicia (ED431C 2020/11), Cátedra CICAS (Sngular, University of A Coruña), and Centro de Investigación de Galicia ‘‘CITIC’’.
@thesis{ezquerro-2023-syntactic,
title = {{Análisis sintáctico totalmente incremental basado en redes neuronales}},
author = {Ezquerro, Ana and Gómez-Rodríguez, Carlos and Vilares, David},
institution = {University of A Coruña},
year = {2023},
url = {https://ruc.udc.es/dspace/handle/2183/33269}
}
@inproceedings{ezquerro-2023-challenges,
title = {{On the Challenges of Fully Incremental Neural Dependency Parsing}},
author = {Ezquerro, Ana and Gómez-Rodríguez, Carlos and Vilares, David},
booktitle = {Proceedings of ICNLP-AACL 2023},
year = {2023}
}