This implementation is based on the cnn library for this software to function. The reference paper is "In-Order Transition-based Constituent Parsing System"
mkdir build
cd build
cmake .. -DEIGEN3_INCLUDE_DIR=/path/to/eigen
make
We borrow the code get_oracle.py to get top-down oracle
./get_oracle.py [training data in bracketed format] [training data in bracketed format] > [training top-down oracle]
./get_oracle.py [training data in bracketed format] [development data in bracketed format] > [development top-down oracle]
./get_oracle.py [training data in bracketed format] [test data in bracketed format] > [test top-down oracle]
, and then compile pre2mid.cc to get pre2mid to convert them into in-order oracle
g++ pre2mid.cc -o pre2mid
./pre2mid [training top-down oracle] > [training oracle]
./pre2mid [development top-down oracle] > [development oracle]
./pre2mid [test top-down oracle] > [test oracle]
If you require the related data, contact us.
Ensure the related file are linked into the current directory.
mkdir model/
./build/impl/Kparser --cnn-mem 1700 -x -T [training oracle] -d [development oracle] -C [development data in bracketed format] -P -t --pretrained_dim 100 -w [pretrained word embeddings] --lstm_input_dim 128 --hidden_dim 128 -D 0.2
./build/impl/Kparser --cnn-mem 1700 -x -T [training oracle] -p [test oracle] -C [test data in bracketed format] -P --pretrained_dim 100 -w [pretrained word embeddings] --lstm_input_dim 128 --hidden_dim 128 -m [model file]
We provide the trained model file in model
./build/impl/Kparser --cnn-mem 1700 -x -T [training oracle] -p [test oracle] -C [test data in bracketed format] -P --pretrained_dim 100 -w [pretrained word embeddings] --lstm_input_dim 128 --hidden_dim 128 -m [model file] --alpha 0.8 -s 100 > samples.act
./mid2tree.py samples.act > samples.trees
The samples.props could be fed into following reranking components.
Jiangming Liu, jmliunlp@gmail.com