This demonstration shows the implementation of a pipeline going from PAGE XML to TEI Publisher created within the frame of the LECTAUREP project.
LECTAUREP is a project jointly led by Inria (ALMAnaCH) and the Archives nationales de France (DMC). Its purpose is to facilitate the exploration of thousands of pages of directories listing minutes and deeds redacted by Parisians notaries between the beginning of the 19th century and the mid-20th centuries. To do so, LECTAUREP relies on automatic transcription performed with Kraken via the eScriptorium web application.
Images are loaded on the platform, then transcribed and annotated, and finally exported to PAGE XML files. The last section of the pipeline aims at offering users a platform to visualise, querry and read the pages of the directories. An almost ready-to-use solution consist in using TEI-Publisher, which requires transforming the PAGE XML files into compliant TEI XML.
LEPIDEMO demonstrates how this transformation can be plugged into eScriptorium as a simple python script.
The demonstration can be followed step by step using the lepidemo.ipynb Jupyter scenario.
- Create a python virtual environment: `virtualenv -p python3 [ENVIRONMENT NAME]
- Activate it
source [ENVIRONMENT NAME]/bin/activate
- Then launch Jupyter with
jupyter notebook
- Open
lepidemo.ipynb
with jupyter browser and then follow cells instructions.
Chagué, A., & Scheithauer, H. LEPIDEMO, a Pipeline Demonstrator for LECTAUREP to go from eScriptorium to TEI-Publisher [Computer software]