Skip to content

Quick Start

taolei87 edited this page Mar 25, 2015 · 9 revisions

1. Compilation

To compile the project, first do a "make" in directory lib/SVDLIBC to compile the SVD library. Next, make sure you have Java JDK installed on your machine and find the directory path of Java JNI include files. The directory should contains header files jni.h and jni_md.h. Take a look or directly use the shell script make.sh to compile the rest of the Java code. You have to replace the "jni_path" variable in make.sh with the correct JNI include path. Also, create a "bin" directory in the project directory before running make.sh script.


2. Data format

We support CoNLL-2006 and CoNLL-2009 data formats, which describe a collection of annotated sentences (and the corresponding gold dependency structures). We assume the dependency trees can be non-projective. See format specifications for more details.


3. Usage

3.1 Train a model

Take a look at run.sh as an example of running the parser. You could also run the parser as follows. The first thing is to add the RBGParser directory to the library path such that the parser can find the compiled jni library for SVD tensor intialization. Assuming the directory is "/path/to/rbg", this can be done by:

export LD_LIBRARY_PATH="/path/to/rbg:${LD_LIBRARY_PATH}"

After this, we can run the parser:

java -classpath "bin:lib/trove.jar" -Xmx32000m \
  parser.DependencyParser \
  model-file:example.model \
  train train-file:example.train

This will train a model (using default settings) from the training data example.train and save the dependency model to the file example.model.


##### 3.2 Test a model

To test a trained model, you could run the following command:

java -classpath "bin:lib/trove.jar" -Xmx32000m \
  parser.DependencyParser \
  model-file:example.model \
  test test-file:example.test \
  output-file:example.test.out

##### 3.3 Tune a model's speed Currently, RBGParser allows users to automatically tune its speed on a development set, without losing too much accuracy. This is simply achieved by adding options "*dev test-file:example.dev*". See the FAQs for more details.

For example, if you are to train a model, just run:

java -classpath "bin:lib/trove.jar" -Xmx32000m \
  parser.DependencyParser \
  model-file:example.model \
  train train-file:example.train \
  dev test-file:example.dev

The parser will start a tuning procedure after the training is done.

If you already have a trained model, then run:

java -classpath "bin:lib/trove.jar" -Xmx32000m \
  parser.DependencyParser \
  model-file:example.model \
  dev test-file:example.dev

Note that the parser will tune the model and over-write the model file.


##### 3.4 More options

The parser will train a 3rd-order parser by default. To train a 1st-order (arc-based) model, run the parser like this:

java -classpath "bin:lib/trove.jar" -Xmx32000m \
  parser.DependencyParser \
  model-file:example.model \
  train train-file:example.train \
  dev test-file:example.dev \
  model:basic

The argument ``model:basic'' specifies the model type (basic: 1st-order features, standard: 3rd-order features and full: more 3rd-order and high-order global features).

There are many other possible running options. Here is a more complicated example:

java -classpath "bin:lib/trove.jar" -Xmx32000m \
  parser.DependencyParser \
  model-file:example.model \
  train train-file:example.train \
  dev test-file:example.dev \
  output-file:example.dev.out \
  model:standard  C:1.0  iters:5  pruning:false \
  R:20 gamma:0.3 thread:4 converge-test:50

This will run a standard model with regularization C=1.0, number of training iteration iters=5, rank of the tensor R=20, number of threads in parallel thread=4, weight of the tensor component gamma=0.3, the number of adaptive hill-climbing restarts during testing converge-test=50, and no dependency arc pruning pruning=false. You may take a look at RBGParser/src/parser/Options.java to see a full list of possible options.


##### 3.5 Using word embeddings

To add unsupervised word embeddings (word vectors) as auxiliary features to the parser. Use option "word-vector:example.embeddings":

java -classpath "bin:lib/trove.jar" -Xmx32000m \
  parser.DependencyParser \
  model-file:example.model \
  train train-file:example.train \
  dev test-file:example.dev \
  model:basic \
  word-vector:example.embeddings

The input file example.embeddings should be a text file specifying the real-value vectors of different words. Each line of the file should starts with the word, followed by a list of real numbers representing the vector of this word. For example:

this 0.01 0.2 -0.05 0.8 0.12
and 0.13 -0.1 0.12 0.07 0.03
to 0.11 0.01 0.15 0.08 0.23
*UNKNOWN* 0.04 -0.14 0.03 0.04 0
...
...

There may be a special word *UNKNOWN* used for OOV (out-of-vocabulary) word. Each line should contain the same number of real numbers.

Clone this wiki locally