-
Notifications
You must be signed in to change notification settings - Fork 7
/
README_resources
93 lines (66 loc) · 4.03 KB
/
README_resources
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
###############################################
# #
# Linguistic Resources used in this project #
# #
###############################################
Lexica
---------
Note on the semantic dictionaries for Spanish used in this project:
The list of nouns with their corresponding semantic tags has been extracted from the Spanish Resource Grammar in september 2012.
Montserrat Marimon, Natalia Seghezzi and Núria Bel.
An Open-source Lexicon for Spanish, Procesamiento del Lenguaje Natural, n. 39, pp. 131-137. Septiembre, 2007. ISSN 1135-5948.
The original files are available from:
http://www.upf.edu/pdi/iula/montserrat.marimon/spanish_resource_grammar.html
The lexicon of verb frames has been taken from AncoraLex Spanish 2.0.3:
http://clic.ub.edu/corpus/en/ancora-descarregues
see also:
Taulé, M., M.A. Martí, M. Recasens (2009).
AnCora: Multilevel Annotated Corpora for Catalan and Spanish. Proceedings of 6th International Conference on language Resources and Evaluation.
Unfortunately, Ancora is not freely available, you have to register to download the resource.
Every verb comes in its own xml file, in order to read them into a hash, paste them all in one file, e.g. like this:
$ cat *.xml > allVerbs.xml
and then use MT_systems/utilities/readInSemanticDix.pl to store it as hash to a file.
FreeLing 3
-----------
squoia_analyzer needs the FreeLing library for Spanish to be installed, available here:
http://nlp.lsi.upc.edu/freeling/
FreeLing is used for tokenization, morphological analysis and named entity recognition and classification.
The folder FreeLingModules contains some adapted files to be used with FreeLing among others:
- an adapted analyzer module that returns the morphologically analyzed text in crf format
- an adapted dictionary
Wapiti
-----------
Toolkit for sequence labeling, used for tagging the Spanish input sentences in the translation system.
Wapiti is freely available from: https://wapiti.limsi.fr/
Lavergne, T., O. Cappé, F. Yvon (2010).
Practical Very Large Scale CRFs. Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Pp. 504–513, Uppsala, Sweden. Association for Computational Linguistics.
MaltParser
-----------
MaltParser is a system for data-driven dependency parsing developed by Johan Hall, Jens Nilsson and Joakim Nivre at Växjö University and Uppsala University, Sweden.
See: http://maltparser.org/
matxin-xfer-lex
---------------------
Lexical transfer module from Matxin (branches), available from:
http://sourceforge.net/projects/matxin/
molif-de
-----------
Finite state morphology tool for German, used for generation of the German output:
https://files.ifi.uzh.ch/cl/siclemat/sprachanalyse/molif/
TrEd - Tree Editor
---------------------
Tool used for the annotation of the Quechua treebank, available from:
http://ufal.mff.cuni.cz/tred/
In order to visualize the Quechua treebank files, install TrEd and copy the stylesheets in trunk/treebanks/quz/stylesheets_for_tred to .tred-stylesheets in your home directory.
Start TrEd, open one of the .pml files (those contain the annotations) and select one of the Quechua stylesheets.
A description of the annotation scheme can be found in trunk/treebanks/annotation_guidelines/anotacion.pdf (dependency trees).
Annotate 3.6
---------------
Annotation tool for phrase structures, used for the German and Spanish treebanks. Available from:
http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/annotate.html
The Spanish treebank is annotated according to the AnCora guidelines (phrase structures), but with a simplified tagset.
The German treebank is annotated according to the TigerCorpus guidelines (phrase structures)
TreeAligner
-------------
Tool used to annotate the alignments of the parallel treebanks (so far only Spanish-German). Available from:
http://kitt.cl.uzh.ch/kitt/treealigner
In order to visualize the parallel trees Spanish-German, download TreeAligner and open one of the 'alignments' files.