UD_French-Sequoia is an automatic conversion of the Sequoia Treebank corpus French Sequoia corpus.
UD_French-Sequoia is an automatic conversion of the Sequoia Treebank corpus French Sequoia corpus. The conversion was done with the Grew software and the Graph Rewriting System described here.
The first version of the Sequoia Corpus was presented in (Candito & Seddah, 2012)
The whole corpus contains 70,624 tokens in 3,099 sentences.
In UD_French-Sequoia, data were randomly split into:
fr_sequoia-ud-test.conllu
: 10,050 tokens in 456 sentencesfr_sequoia-ud-dev.conllu
: 10,013 tokens in 412 sentencesfr_sequoia-ud-train.conllu
: 50,561 tokens in 2,231 sentences
The original sentences of the corpus are taken from:
- French Europarl (
sent_id
prefix:Europar.550
) - Wikipédia Fr (
sent_id
prefix:frwiki_50.1000
) - Newspaper Est Républicain (
sent_id
prefix:annodis.er
) - European Medicines Agency (
sent_id
prefix:emea-fr-dev
andemea-fr-test
)
The conversion has been performed by Bruno Guillaume with the Graph Rewriting System described here developed by Bruno Guillaume and Guy Perrier.
The Sequoia Corpus was presented in (Candito & Seddah, 2012) and revised later, notably during the project of deep annotation described in (Candito & al. 2014) and (Perrier & al. 2014).
(Candito & Seddah, 2012) Marie Candito, Djamé Seddah. Le corpus Sequoia : annotation syntaxique et exploitation pour l'adaptation d'analyseur par pont lexical. TALN 2012 - 19e conférence sur le Traitement Automatique des Langues Naturelles, Jun 2012, Grenoble, France. 2012.
(Candito & al. 2014) Marie Candito, Guy Perrier, Bruno Guillaume, Corentin Ribeyre, Karën Fort, Djamé Seddah and Éric de la Clergerie. (2014) Deep Syntax Annotation of the Sequoia French Treebank. Proc. of LREC 2014, Reykjavic, Iceland.
(Perrier & al. 2014) Guy Perrier, Marie Candito, Bruno Guillaume, Corentin Ribeyre, Karën Fort and Djamé Seddah. (2014) Un schéma d'annotation en dépendances syntaxiques profondes pour le français. Proc. of TALN 2014, Marseille, France.
- 2018-04-15 v2.2
- Subtyping of the
obl
relation with 3 relationsobl:arg
,obl:mod
,obl:agent
- Several corrections in the original corpus and in the conversion process
- Subtyping of the
- 2017-11-15 v2.1
- Manual corrections in the original Treebank
- Application of an updated conversion system, taking into account new decisions taken for harmonisation of several French Treebanks (causative, copules, auxiliaries)
- 2017-03-01 v2.0
- First release in UD