Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
cmake		cmake
cnn		cnn
impl		impl
scripts		scripts
CMakeLists.txt		CMakeLists.txt
README.md		README.md
config.h.cmake		config.h.cmake
pre2mid.cc		pre2mid.cc

Repository files navigation

InOrderParser

This implementation is based on the cnn library for this software to function. The reference paper is "In-Order Transition-based Constituent Parsing System"

Building

mkdir build
cd build
cmake .. -DEIGEN3_INCLUDE_DIR=/path/to/eigen
make

Data

We borrow the code get_oracle.py to get top-down oracle

./get_oracle.py [training data in bracketed format] [training data in bracketed format] > [training top-down oracle]
./get_oracle.py [training data in bracketed format] [development data in bracketed format] > [development top-down oracle]   
./get_oracle.py [training data in bracketed format] [test data in bracketed format] > [test top-down oracle]

, and then compile pre2mid.cc to get pre2mid to convert them into in-order oracle

g++ pre2mid.cc -o pre2mid
./pre2mid [training top-down oracle] > [training oracle]
./pre2mid [development top-down oracle] > [development oracle]
./pre2mid [test top-down oracle] > [test oracle]

If you require the related data, contact us.

Training

Ensure the related file are linked into the current directory.

mkdir model/
./build/impl/Kparser --cnn-mem 1700 --training_data [training oracle] --dev_data [development oracle] --bracketing_dev_data [development data in bracketed format] -P -t --pretrained_dim 100 -w [pretrained word embeddings] --lstm_input_dim 128 --hidden_dim 128 -D 0.2

Test

./build/impl/Kparser --cnn-mem 1700 --training_data [training oracle] --test_data [test oracle] --bracketing_dev_data [test data in bracketed format] -P --pretrained_dim 100 -w [pretrained word embeddings] --lstm_input_dim 128 --hidden_dim 128 -m [model file]

The automatically generated file test.eval is the result file.

We provide the trained models: English model and pretrained word embeddings sskip.100.vectors; Chinese_model and pretrained word embeddings zzgiga.sskip.80.vectors

Sampling

./build/impl/Kparser --cnn-mem 1700 --training_data [training oracle] --test_data [test oracle] --bracketing_dev_data [test data in bracketed format] -P --pretrained_dim 100 -w [pretrained word embeddings] --lstm_input_dim 128 --hidden_dim 128 -m [model file] --alpha 0.8 -s 100 > samples.act
./mid2tree.py samples.act > samples.trees

The samples.props could be fed into following reranking components.

Easy Usage

Download the model for English.

./build/impl/Kparser-standard --cnn-mem 1700 --model_dir model -w sskip.100.vectors --train_dict train_dict < [stdin] > [stdout]

The standard input should follow the fomart, Word1 POS1 Word2 POS2 ... Wordn POSn. The example is

No RB , , it PRP was VBD n't RB Black NNP Monday NNP . .

The standard output is tree in bracketed format.

(S (INTJ (RB No)) (, ,) (NP (PRP it)) (VP (VBD was) (RB n't) (NP (NNP Black) (NNP Monday))) (. .))

If you want to sample trees, you should added --samples [number of samples] --a [alpha], for example, --sample 100 --a 0.8

Contact

Jiangming Liu, jmliunlp@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InOrderParser

Building

Data

Training

Test

Sampling

Easy Usage

Contact

About

Releases

Packages

Languages

LeonCrashCode/InOrderParser

Folders and files

Latest commit

History

Repository files navigation

InOrderParser

Building

Data

Training

Test

Sampling

Easy Usage

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages