This repository contains the data for the Ancient Greek Dependency Treebank 2.1. In this release no change has been made to the texts already contained in the version 2.0. Information about the editions of the texts can be found within the files. The currently available texts are:
Author | Text |
---|---|
Aesop | Fables (1.1-1.50) |
Aeschylus | Agamemnon |
Eumenides | |
Libation Bearers | |
Persians | |
Prometheus Bound | |
Seven Against Thebes | |
Suppliants | |
Athenaeus | The Deipnosophists (12-13) |
Diodorus Siculus | Library (11) |
Herodotus | Histories (1) |
Hesiod | Shield of Heracles |
Theogony | |
Works and Days | |
Homer | Iliad |
Odyssey | |
Lysias | Oration 1 |
Oration 14 | |
Oration 15 | |
Oration 23 | |
Plato | Euthyphro |
Plutarch | Alcibiades |
Lycurgus | |
Polybius | Histories (1) |
Pseudo Apollodorus | Library (1.1.1-1.4.1) |
Pseudo Homer | Hymn to Demeter |
Sophocles | Ajax |
Antigone | |
Electra | |
Oedipus Tyrannus | |
Trachinae | |
Thucydides | Histories (1) |
The data have been semi-automatically annotated. They have been annotated manually, but many corrections have been performed automatically in order to improve consistency. Morphology has been annotated with the help of the Morpheus tagger.
The full tagsets can be consulted in TAGSETS.xml.
Data have been annotated using the following guidelines:
- Guidelines for the Syntactic Annotation of the Ancient Greek Dependency Treebank (1.1) (GSAAGDT)
- Guidelines for the Ancient Greek Dependecy Treebank 2.0 (GAGDT)
The GAGDT 2.0 can be considered as an extension of the GSAAGDT 1.1, by making them more stringent. The GAGDT 2.0 also allows semantic annotation. Each annotation file specifies which of the above guidelines has been adopted.
This release of the data has mainly concerned normalization and harmonization with respect to Ancient Greek characters, XML structure, and morphological annotation.
The structure of the original XML files (i.e., the one according to the XML schema which is digested in the Perseids platform, where annotations are peformed) has been changed in order to make it more informative and easier to query. The treebank
root element identifies the version of the release (@version
) and the cts for each text (@cts
). The (pseudo-TEI) header
element
contains information/credits about the creation of the file. The biblStruct
element contains information about the ancient author and text, which helps interpretation of @cts
.
The original structure of sentence
and word
elements is preserved with some normalization concerning non-linguistically relevant nodes: @span
and @cid
have been deleted and some normalization has been applied to the display of cts:urn values within sentence (such values are available on a sentence level, and sometimes also on a word level).
The texts have been checked/normalized for the Greek characters and the punctuation marks. The decomposed Greek forms (NFD) in @artificial
(i.e., elliptical) nodes have been made composed (NFC).
The @form
and @lemma
values for punctuation marks have been corrected and levelled: both values now contain the form of a punctuation mark and @postag contains u--------
The tagsets form morphology have been harmonized. Previous versions of the treebank had "participle" also as a part of speech. Now "participle" is only treated as a kind of mood. Similarly, all words with morphological category "exclamation" have been assigned the category "interjection".
Some work is still needed in order to satisfactorily deal with the distinction adverb/particle.
A major effort has been put to correct errors contained in @postag
for nouns. Some problems remains for those nouns
which have not been annotated properly (missing values), which will have to be corrected manually.
(in alphabetical order)
Giuseppe G. A. Celano, J. F. Gentile, Robert Gorman, Vanessa Gorman, Jordan Hawkesworth, Yoana Ivanova, Tovah Keynton, Florin Leonte, Alex Lessie, Daniel Lim Libatique, Meg Luthin, Francesco Mambrini, George Matthews, Jack Mitchell, Molly Miller, Jessica Nord, Sean Stewart, Anthony D. Yates, Polina Yordanova, and Sam Zukoff.