Skip to content

AltoLab

jgroschwitz edited this page Oct 13, 2017 · 7 revisions

Connecting to Alto Lab

You can record runtimes and accuracies in Alto Lab. Use the bash script in scripts/alab for this.

Setup

You can run Alto Lab by executing the shell script in scripts/alab. The script requires that you have set the environment variable ALTO_JAR to the absolute pathname of your executable Alto jar file, e.g.:

export ALTO_JAR=/home/user/blah/alto-2.1.jar

Configuration

alab caches grammars and corpora that it downloads from Alto Lab in the directory ~/.alto. It requires a configuration file at ~/.alto/alto.cfg, which specifies how to connect to Alto Lab. A minimal configuration file looks as follows:

altolab.baseurl=http://bla.com/
altolab.username=uuuuuu
altolab.password=pppppp

Use the same username and password that you would use to login on the Alto Lab website.

Database structure

Under construction. If you want to run Alto Lab on your own database, feel free to contact us.

Creating a new task

Create new tasks directly in the database. You can add new grammars and corpora in the Alto Lab web interface.

Concerning the code of a task: In each line, new object is computed, the syntax is 'object = value'. For the value, you can use

  1. previously defined variables,
  2. the '$' symbol for the main grammar (as an InterpretedTreeAutomaton object),
  3. for the input object given by the current corpus instance's interpretation abc (e.g. or ).
  4. Strings surrounded by double underscores (e.g. __teststring__), numbers e.g. in the format 0.001 (no quotation marks / underscores)
  5. functions that are annotated with '@OperationAnnotation(code="xyz")' in the Alto source code (you refer to them with xyz). Constructors and static functions do not need a reference to the class they belong to, their 'code'-annotation must be unique within the whole Alto project. Non-static functions are called with a '.', as in '$.filter("graph", )', see example below;
  6. as a shortcut, $.[abc] refers to the interpretation abc of the grammar $ (also works for other IRTG variables that are not '$');
  7. #0, #1 etc to refer to additional data, see below. ##0 etc reads the data one line per instance.

Example:

#!custom
filteredIRTG = $.filter("graph", <graph>)

referring to

#!java
    @OperationAnnotation(code = "filter")
    public InterpretedTreeAutomaton filterForAppearingConstants(String interpName, Object input) {
...
}

You can also define an object without specifying its value, the value (or the code to compute its value, in the style just described) must then be given upon execution of the task. Use the -V option of the alab script for that, by writing e.g. -Vobject1=value1 -Vobject2=value2 if object1 and object2 have unspecified values in the task.

A line starting with 'init' or 'global' is called once per corpus, and not once per instance. 'init' is called before all the instances, 'global' after. If an assignment of a global variable uses non-global variables, it is given the array of all values obtained for each instance. For a line starting with 'export', the value of this variable is entered as a result in the database for every run.

Additionally, you can measure times with the commands 'starttime xyz' and 'stoptime xyz' where xyz is a name of your choosing.

A full example task:

#!custom
starttime total
automaton = $.auto.intersect($.[graph].invhom($.[graph].alg.decomp(<graph>)))
export derivation_tree
export result = $.[string].alg.eval($.[string].hom.apply(derivation_tree))
stoptime total

Here the value of derivation_tree must be given at runtime.

Running a task

To run the example task, assuming it has ID say 90, call

#!custom
alab 90 -Vderivation_tree=automaton.viterbi

For help on further options, run the command with just the --help option (these options can be very useful, such as multithreading and verbose output).

The results are automatically entered into the database.

Additional data

Sometimes you want to use larger strings as a parameter in your task, e.g. when you want to use the same task for different grammars, and do not want to create a new task entry in the database every time. When running the task, add the --data x y ... option, where x y etc. are the ID's of the entries in the additional_data table. In the task definition, the data is referred to with #0, #1 etc., in the order their ID's are mentioned in the '-data' option (here, #0 would refer to x, and #1 to y).

Example: When a task with the code

#!custom
var1 = #1
var2 = #0

is run with the command-line option -data 25 7, then var1 gets the additional_data entry in the database with ID 7, and var2 the one with ID 25. To read one line of the data per instance of the corpus, use ##0 and ##1 instead.