The following treebanks (all of which are bèta-versions) were developed by following the guidelines of the Perseus Dependency Treebanks. See also <en.pedalion.org/treebanks>. (NB: For experimental reasons, some trees exhibit the part-of-speech designation 'b', equalling in most cases 'd' or 'c').
What? All papyri texts, which will be offered to the Sematia project, are annotated with Trismegistos-id's (both on the text level and on the word level).
Who? Alek Keersmaekers is the principle annotator of this corpus. A number of texts were annotated by Louis Verreth (based on a first automated parsing by Alek Keersmaekers). In addition, there are student contributions, partly based on a first automated parsing by Alek Keersmaekers, by Anna Bloemen; Mathieu Cuijpers; Niels De Ridder; Sanderijn Gijbels; Yoran Joosten; Yordi Lenaerts; Jonas Roose; Tibo Schuermans; Eva Uffing; Chiara Van der Hasselt; Lisa Vanhee; Jolien Volders; Anne-Sophie Vounckx (KU Leuven undergraduate students of Greek in the academic years 2017-2018 and 2018-2019). The entire annotation process is supervised by Alek Keersmaekers. More details can be found in the corresponding xml-file.
How much? Currently ca. 12K tokens.
First release? 2018
Updates? February 2019: addition of student annotations.
What? Pedalion.org offers a modular grammar of Ancient Greek. The English version, en.pedalion.org, is still under construction. This grammar relies on a high number of original example sentences, a large number of which have been treebanked.
Who? Supervised by Toon Van Hal. Most sentences were annotated by Toon Van Hal, all of which are based on a first automated parsing by Alek Keersmaekers. In addition, there are student contributions by Mathieu Cuijpers; Sanderijn Gijbels; Yoran Joosten; Yordi Lenaerts; Eva Uffing; Chiara Van der Hasselt; Lisa Vanhee; Jolien Volders; Anne-Sophie Vounckx (each student encoded ca. 300 tokens).
How much? Currently ca. 20K tokens.
First release? 2018
Updates? February 2019: addition of student annotations; correction of previous data |
August 2019: addition and correction of sentences |
Still to be done: The metadata of the sentences will be enhanced.
What? Parts of Genesis
Who? Jonas Roose, based on a preparsed version by Alek Keersmaekers. Partially corrected by Toon Van Hal & Alek Keersmaekers. We were happy to rely on the part-of-speech annotation by Kraft, R., ed. 1988. Morphologically Analyzed Septuagint (version 1.0). Computer-Assisted Tools for Septuagint Studies (CATSS), University of Pennsylvania. http://ccat.sas.upenn.edu/gopher/.
How much? Currently ca. 20K tokens.
First release? 2019.
Updates? August 2019: addition and correction of sentences | We will try to come up with a complete treebank of Genesis.
What? Selected fables, ascribed to Aesopos
Who? Annotated by Colin Swaelens, Eva Uffing. Corrected by Sanderijn Gijbels and Yoran Joosten. Based on a preparsed text by Alek Keersmaekers. Partially corrected by Toon Van Hal.
How much? ca. 7,5K tokens.
First release? 2019
What?
Who? Annotated by Sanderijn Gijbels. Based on a preparsed text by Alek Keersmaekers. Partially corrected by Toon Van Hal.
How much? ca. 9K tokens.
First release? 2019
What?
Who? Annotated by Toon Van Hal, with student contributions by Mathieu Cuijpers; Sanderijn Gijbels; Yoran Joosten; Yordi Lenaerts; Eva Uffing; Chiara Van der Hasselt; Lisa Vanhee and Jolien Volders (KU Leuven Bachelor 3, 2018-2019). Based on a preparsed text by Alek Keersmaekers. Controlled by Toon Van Hal, Sanderijn Gijbels and Yoran Joosten.
How much? ca. 10K tokens.
First release? 2019
What?
Who? Annotated by Colin Swaelens. Based on a preparsed text by Alek Keersmaekers.
How much? ca. 2,2K tokens.
First release? 2019.
What?
Who? Annotated by Yoran Joosten. Based on a preparsed text by Alek Keersmaekers.
How much? ca. 650 tokens.
First release? 2019.
Updates? Will be expanded and corrected soon.
What? Lucian, Prometheus on Caucasus, in: Lucian: Works with an English Translation by. A. M. Harmon. Cambridge, MA. Harvard University Press. London. William Heinemann Ltd. 1915. 2. Via: http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0062.tlg020.
Who? Supervised by Toon Van Hal, with student contributions by De Schutter, A.; Knapen, C.; Weets, C.; Van Nunen, E.; De Smet, Isabeau; Van Bever, L.; Valadou, M.; De Backer, Olympe; Keupers, S.; Vangenechten, Thomas; Valgaeren, Thomas; Nelis, Tilke.
How much? ca. 2,5K tokens.
First release? 2018
Updates? Updated file in April 2019, with corrections by Toon Van Hal and Alek Keersmaekers.
What?
Who? Annotated by Sanderijn Gijbels, corrected by Toon Van Hal. Based on a preparsed text by Alek Keersmaekers.
How much? ca. 1300 tokens.
First release? 2019
What?
Who? Annotated by Yoran Joosten. Based on a preparsed text by Alek Keersmaekers. Partially corrected by Alek Keersmaekers and Toon Van Hal.
How much? ca. 5,5K tokens.
First release? 2019
What?
Who? Annotated by Yoran Joosten. Based on a preparsed text by Alek Keersmaekers.
How much? ca. 640 tokens.
First release? 2019
What?
Who? Annotated by Alek Keersmaekers
How much? ca. 11K tokens.
First release? 2018
What? Theocritus, Mimnermus, Semonides How much? ca. 1,5K Tokens Who? Annotated by Louis Verreth and Wouter Mercelis, on the basis of an automatically parsed version by Alek Keersmaekers, and partially corrected by Toon Van Hal
What? Lysias (Or. 24) with an English translation by W.R.M. Lamb, M.A. Cambridge, MA, Harvard University Press; London, William Heinemann Ltd. 1930.
Who? Annotated by Louis Verreth, on the basis of an automatically parsed version by Alek Keersmaekers, and corrected by Toon Van Hal
How much? ca. 1,5K tokens.
First release? 2018, update in April 2019.
What?
Who? Annotated by Yoran Joosten. Based on a preparsed text by Alek Keersmaekers. Partially corrected by Alek Keersmaekers & Toon Van Hal
How much? ca. 8K tokens.
First release? 2019.
Updates? Will be expanded and corrected soon.
What?
Who? Annotated by Yoran Joosten. Based on a preparsed text by Alek Keersmaekers.
How much? ca. 650 tokens.
First release? 2019.
What? Plato. Platonis Opera, ed. John Burnet. Oxford University Press. 1903.
Who? Annotated by Wouter Mercelis, on the basis of an automatically parsed version by Alek Keersmaekers, and corrected by Toon Van Hal
How much? ca. 1,8K tokens.
First release? 2018, update in April 2019.
What? This database-generated list contains a number of modifications in the existing Ancient Greek Dependency Treebanks. We are currently conducting experiments with automated parsing of Greek, and we are therefore attempting to homogenize the training corpus. The modifications included are of a manifold nature. The number of what we believe are clear mistakes is just a minor -- although not unsubstantial -- part of the file: most suggestions are made for purposes of homogenization. Later versions of this file will likely qualify the nature of each modification made. As this is work in progress, it is safe to say that this file might also contain a number of improvements for the worse.
The modifications are implemented in our own treebank search device, DendroSearch.
Who? Toon Van Hal and Alek Keersmaekers.
How much? The current release version contains modifications of ca. 120K tokens.
First release? 2018
Updates? March 2019; April 2019.
The following texts are currently being annotated or corrected:
- Hippocrates' Oath – annotated by Louis Verreth
- Epictetus – annotated by Jonas Roose
- Sextus Empiricus – annotated by Yoran Joosten
- Isocrates' letters (selection) – annotated by Toon Van Hal
For our experiments with automated analysis, we thankfully rely on the high number of treebanks readily available
- Perseus Treebanks: https://perseusdl.github.io/treebank_data/
- PROIEL Treebanks: https://proiel.github.io/
- The Gorman Treebanks: https://github.com/rgorman/Greek_Dependency_Treebanks
- Harrington's Treebanks: https://perseids-project.github.io/harrington_trees/.
- The Sematia Project: https://github.com/ezhenrik/sematia
Our treebank data was created and edited through the help of the Arethusa application (https://github.com/alpheios-project/arethusa) as provided by the Perseids Project at Tufts University (https://perseids.org). Arethusa has received support from the Andrew W. Mellon Foundation, the Institute of Museum and Library Services, Tufts University, and the Humboldt Chair of Digital Humanities at Leipzig. Arethusa is now being jointly maintained by the Perseids Project at Tufts University and The Alpheios Project, Ltd.
Since January 2019, this work is also partly funded through an FWO research grant (Research Foundation Flanders).
We will assign a Creative-Commons licence to our treebanks, probably the following one: https://creativecommons.org/licenses/by-sa/4.0/. Please feel free to contact us for further questions.
toon -dot- vanhal -emailsign- kuleuven.be; alek -dot- keersmaekers -emailsign- kuleuven.be