conllu2Parlamint_TEI

Corpus Parlamint Directory Structure

The starting parliamentary corpus is structurally divided by years, from 2013 to 2020, each of which contains the sessions in TEI XML format. Each session contains sections, motions, information on the speaker on her role, and above all the text of the speeches divided into small narratives contained within the tag. These textual units, which can correspond to different sentences, are the ones that are of interest for the project and have been linguistically analyzed. Each session therefore contains a variable number of segments (50 - 100-150 just to give an order of ideas) with its own unique identifier whose text has been linguistically analyzed producing files in CoNLL-U format.

Corpus Parlamint Linguistically Analyzed Directory Structure

It will therefore appear that each session will have associated a directory of linguistically analyzed files in which number corresponds to the number of segments it contains. The names of the files have meaning since the file corresponding to the session YYY.xml of a certain year containing N segments will correspond to N files contained in the YYY directory and logically identified by the files YYY.seg1.udner, YYY.seg2, udener, … YYY.segN.udner

The program takes as input the list of files of a year, one per line, and assumes that for each file Parlamint_xxxx_sed_n.xml with n = 1,2,3 .... exist a directory Parlamint_xxxx_sed_n with as many segment files as are the segments present in Parlamint_xxxx_sed_n.xml with the constraint of having the name Parlamint_xxxx_sed_n.seg_m.ud.udner with m=1,2,3.... number of segment.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
main.cpp		main.cpp
main.h		main.h
type.h		type.h
util.cpp		util.cpp
util.h		util.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

conllu2Parlamint_TEI

About

Releases

Packages

Languages

cnr-ilc/conllu2Parlamint_TEI

Folders and files

Latest commit

History

Repository files navigation

conllu2Parlamint_TEI

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages