This is the French novel corpus for the ELTeC, the European Literary Text Collection, produced by the COST Action Distant Reading for European Literary History (CA16204, https://distant-reading.net). The current version is v1.0.1.
Note that this corpus is also available in a linguistically-annotated format prepared for direct import into the text analysis tool TXM; see here: 10.5281/zenodo.4274478. This format is based on v1.0.0 of the corpus.
An overview over the authors and works represented in the collection can be gained here: https://distantreading.github.io/ELTeC/fra/index.html.
- Collection editors: Christof Schöch and Lou Burnard
- Contributors: Pia Geißel, Rezearta Murati, Evegnia Fileva
- Sources: Bibliothèque nationale de France (Gallica), Ebooks libres et gratuits / Bibliothèque électronique du Québec, CLiGS textbox, Wikisource, Bibebook.com, Atramenta, OBVIL, Project Gutenberg.
All texts included in this collection are in the public domain. No claim to copyright or similar protections is made for the composition of the corpus, the collection and presentation of the metadata, or the transcription and encoding of the texts.
If you use this corpus in your research or teaching, please follow good scholarly practice and use the following citation suggestion to acknowledge your source:
- French Novel Corpus (ELTeC-fra), edited by Christof Schöch and Lou Burnard. Version v1.0.1, April 2021. In: European Literary Text Collection (ELTeC). COST Action Distant Reading for European Literary History. DOI: https://doi.org/10.5281/zenodo.4662433
@collection{schoech_ELTeCfra_2020,
title = {French Novel Collection (ELTeC-fra)},
maintitle = {European Literary Text Collection (ELTeC)},
editor = {Schöch, Christof and Burnard, Lou},
version = {v1.0.1},
year = {2021},
month = {4},
publisher = {COST Action Distant Reading for European Literary History},
url = {https://github.com/COST-ELTeC/ELTeC-fra/},
doi = {10.5281/zenodo.4662433},
}
General information about ELTeC releases is available at https://github.com/COST-ELTeC/ELTeC.
The concept DOI for all versions of ELTeC-fra is the following: https://doi.org/10.5281/zenodo.3462535.
- recent changes: A level-2-encoded version valid against the
level2_strict
schema (with<s>...</s>
tags) is now available (June 2023). - recent changes: A linguistically-annotated version (level 2 encoding) is now available, ahead of a v2.0.0 release.
- v1.0.1, April 2021: This release includes 100 novels in level 1 encoding. Minor updates to the metadata were provided with this release. The DOI of this release release is: 10.5281/zenodo.4662433
- v1.0.0, November 2020: This release includes 100 novels in level 1 encoding. With this release, a corpus compliance score (E5C) of 100 was reached. The DOI of this release release is: 10.5281/zenodo.4264647
- v0.9.1, June 2020: This release includes 100 novels in level 1 encoding. Some further enhancements remain planned as work towards v1.0.0. See: v0.9.1 and issues in milestone v1.0.0. The E5C score of this release is 97.7/100.
- v0.9.0, May 2020: There are now 100 novels in level 1 encoding. The corpus composition criteria are met and major bugs are fixed, but some enhancements are still planned as work towards v1.0.0. See: v0.9.0 and issues in milestone v1.0.0.
- v0.8.0 (deprecated), November 2019: The corpus contains 82 novels encoded at level 1. The corpus composition criteria are not yet fully fulfilled.