Description
-
simplest scenario: only one
titleInfo
per level (main or related item), with no qualifying attributes and no subtitle element (by my calculation - 94,123 titles):
//*[local-name()="relatedItem"]/*[local-name()="titleInfo" and not(@*)]/*[local-name()="title" and not(@*) and not(following-sibling::*[local-name()='subTitle'])]
(check for length - if over 252 characters - truncate and add full_title corresponding field) -
slightly more complicated: only one
titleInfo
per level (main or related item), with no qualifying attributes and asubTitle
child (by my calculation - 3,748 titles)
//*[local-name()="titleInfo" and not(@*)]/*[local-name()="title" and not(@*) and following-sibling::*[local-name()='subTitle' and text()]]
(concatenatetitle
andsubtitle
, separate by a dot, and then calculate total length and truncate / add full title field as discussed above) -
even more complicated - multiple
titleInfo
per level
there are a total of 5 objects in CWRC that contain atitleInfo
element that is preceded by anothertitleInfo
sibling but doesn’t have a type attribute. (//*[local-name()='titleInfo' and preceding-sibling::*[local-name()="titleInfo"] and not(@type)]
)
These, along with the titles that are typed ‘alternative’ (//*[local-name()='titleInfo' and @type='alternative']
) should go in the corresponding alternative title field. About 4 alternative titles have subtitles as well, so those should be concatenated like all the other title/subtitle pairs -no need to test for # of characters since it’s not the main title field and can exceed 253
There are also 1560 instances of @type='abbreviated' , which should also be mapped to an alternative title field. -
titleInfo
withnonSort
children (2,595 objects): concatenate thenonSort
content with the title content no need to fiddle with capitalization, as for the title values I have seen, the capitalization is consistent with the title language conventions. count length and truncate if need -
620 descendants of
titleInfo
are enclosed in TEI elements - @ilovan to add a "Display title" field with full HTML formatting and provide mappings for TEI elements.
To Dos:
- check if we can have one or more alternative titles.
- handle modCollection better
- handle nonSort (concat with title)
- check titleInfor types to see if all handled properly (some are in place but I'm not certain of coverage)
- check usage and type attributes
- check relatedItem containing descendant relatedItem
- general checks to verify the current work (see test_column_title.xquery - currently filtering on orlando namespace)
Spreadsheet with mappings and objects inventory: https://docs.google.com/spreadsheets/d/1S-TYcNnv3g8EQPUwqbJDVO5xpDwIHVTL/edit#gid=2097076917