Name	Name	Last commit message	Last commit date
Latest commit History 29 Commits
src	src
.gitignore	.gitignore
LICENSE.txt	LICENSE.txt
Makefile	Makefile
README.md	README.md

Name

Last commit message

Last commit date

src

fast_align

fast_align is a simple, fast, unsupervised word aligner.

If you use this software, please cite:

Chris Dyer, Victor Chahuneau, and Noah A. Smith. (2013). A Simple, Fast, and Effective Reparameterization of IBM Model 2. In Proc. of NAACL.

The source code in this repository is provided under the terms of the Apache License, Version 2.0.

A variant of fast_align is included in the cdec translation system. It uses the same model and produces identical alignments, but it has a few extra features for online alignment with pre-built models.

Input format

Input to fast_align must be tokenized and aligned into parallel sentences. Each line is a source language sentence and its target language translation, separated by a triple pipe symbol with leading and trailing white space (|||). An example is as follows.

doch jetzt ist der Held gefallen . ||| but now the hero has fallen .
neue Modelle werden erprobt . ||| new models are being tested .
doch fehlen uns neue Ressourcen . ||| but we lack new resources .

Compiling and using `fast_align`

fast_align requires only a C++ compiler; it can be compiled by typing make at the command line prompt. Run fast_align to see a list of command line options.

The usual way to run fast_align to generate source–target alignments is:

./fast_align -i text.fr-en -d -o -v > forward.align

The usual way to run fast_align to generate target–source alignments is:

./fast_align -i text.fr-en -d -o -v -r > reverse.align

Output

fast_align produces outputs in the i-j "Pharaoh" format, where a pair i-j indicates that the ith word of the source is aligned to the jth word of the target sentence. For example, a good alignment of the above example corpus would be:

0-0 1-1 2-4 3-2 4-3 5-5 6-6
0-0 1-1 2-2 2-3 3-4 4-5
0-0 1-2 2-1 3-3 4-4 5-5

Acknowledgements

The development of this software was sponsored by the U.S. Army Research Laboratory and the U.S. Army Research Ofﬁce under contract/grant number W911NF-10-1-0533.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fast_align

Input format

Compiling and using `fast_align`

Output

Acknowledgements

About

Releases

Packages

Languages

License

zaemyung/fast_align

Folders and files

Latest commit

History

Repository files navigation

fast_align

Input format

Compiling and using fast_align

Output

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Compiling and using `fast_align`

Packages