Skip to content

Parsing

Aidan Sawyer edited this page May 18, 2018 · 3 revisions

Introduction

The text file format I’ve decided upon is broken up into a header and a body. Those elements in the header (enumerated in options.txt) are more rigidly specified and statically determined by line number. Those in the body are more programmatically determined as they are parsed, according to the starting character and capitalization of each line.

Header

The header is flexibly controlled by the contents of the 'config.txt' file and determines what each line of each text file gets set to.

Header Structure

Line# Config.txt key Sample item.txt value (content)
1 TITLE The Title of the Item
2 SERIES Volume 123, Number 04
3 DATE_ISSUED 1990-10-23
4 TITLE_ALTERNATIVE The Subtitle of the item
5 FILENAME Filename_of_given_type.pdf

Example Config.txt

TITLE
SERIES
DATE_ISSUED
TITLE_ALT
FILENAME

Example Sample_item.txt

The Title of the Item
Volume 123, Number 04
1990-10-23
The Subtitle of the item
Filename_of_given_type.pdf

...

Body

Parsing Triggers Key

Match Name Description Effect
// Comment/Note Note by worker about item's accuracy (optionally) add note to dc.description
- Element Mode Change element type by what is found in inner text switch which element is being added (dc.contributor to dc.description.tableofcontents)
[A-Z\s]+ All Caps Refine qualifier by the ALL CAPS text switch qualifiers within elements (dc.contributor.author to dc.contributor.illustrator)