-
Notifications
You must be signed in to change notification settings - Fork 0
Parsing
Aidan Sawyer edited this page May 18, 2018
·
3 revisions
The text file format I’ve decided upon is broken up into a header and a body.
Those elements in the header (enumerated in options.txt
) are more rigidly
specified and statically determined by line number. Those in the body are more
programmatically determined as they are parsed, according to the starting character
and capitalization of each line.
The header is flexibly controlled by the contents of the 'config.txt' file and determines what each line of each text file gets set to.
Header Structure
Line# | Config.txt key | Sample item.txt value (content) |
---|---|---|
1 | TITLE | The Title of the Item |
2 | SERIES | Volume 123, Number 04 |
3 | DATE_ISSUED | 1990-10-23 |
4 | TITLE_ALTERNATIVE | The Subtitle of the item |
5 | FILENAME | Filename_of_given_type.pdf |
Example Config.txt
TITLE
SERIES
DATE_ISSUED
TITLE_ALT
FILENAME
Example Sample_item.txt
The Title of the Item
Volume 123, Number 04
1990-10-23
The Subtitle of the item
Filename_of_given_type.pdf
...
Match | Name | Description | Effect |
---|---|---|---|
// | Comment/Note | Note by worker about item's accuracy | (optionally) add note to dc.description
|
- | Element Mode | Change element type by what is found in inner text | switch which element is being added (dc.contributor to dc.description.tableofcontents ) |
[A-Z\s]+ | All Caps | Refine qualifier by the ALL CAPS text | switch qualifiers within elements (dc.contributor.author to dc.contributor.illustrator ) |