Skip to content

Latest commit

 

History

History
24 lines (16 loc) · 1.88 KB

Clarin in the Low Countries.md

File metadata and controls

24 lines (16 loc) · 1.88 KB
Error in user YAML: (<unknown>): found character that cannot start any token while scanning for the next token at line 1 column 9
---
bibtex: @incollection{Betti2017-BETPBC,
  editor = {J. Odijk and A. Van Hessen},
  title = {@PhilosTEI: Building Corpora for Philosophers},
  author = {Arianna Betti and Martin Reynaert and Hein Van Den Berg},
  pages = {379--392},
  year = {2017},
  booktitle = {Clarin in the Low Countries}
}
---

For philosophers to be able to take a computational turn in their field, especially if that field relies heavily on historical material, it is crucial to be able to build high-quality, easily and freely accessible corpora in a sustainable format composed from multi-language, multi-script books from different historical periods. At the moment, corpora matching these needs are virtually non-existent. Within the CLARIN-NL project @PhilosTEI, we have addressed the problem of building this kind of corpora by developing an open-source, web-based, user- friendly workflow from textual images to TEI, based on state-of-the-art open-source OCR software Tesseract, and a multi-language version of TICCL, a powerful OCR post-correction tool. We have demonstrated the utility of the @PhilosTEI tool by applying it to a multilingual, multi-script corpus of important 18th to 20th century European philosophical texts.

Webbased OCR for gothic and roman type faces

The main objective of the CLARIN-NL project @PhilosTEI was to develop a web-based, user- friendly workflow from scanned images of text to TEI (Text Encoding Initiative) (p379)

Computational tools and methods have significantly impacted philosophical research (van den Berg et al., 2014; Ess, 2004). Another ontology aiding philosophical research is given by (Grenon and Smith, 2009).

Challenges with copyright

Importantly, many among these editions are not in open access, so their use within digital phi- losophy projects is severely limited. This applies to e.g. commercial electronic editions, and to the content provided by the TLG. (p381)