Skip to content

titles handling #14



  1. simplest scenario: only one titleInfo per level (main or related item), with no qualifying attributes and no subtitle element (by my calculation - 94,123 titles):
    //*[local-name()="relatedItem"]/*[local-name()="titleInfo" and not(@*)]/*[local-name()="title" and not(@*) and not(following-sibling::*[local-name()='subTitle'])]
    (check for length - if over 252 characters - truncate and add full_title corresponding field)

  2. slightly more complicated: only one titleInfo per level (main or related item), with no qualifying attributes and a subTitle child (by my calculation - 3,748 titles)
    //*[local-name()="titleInfo" and not(@*)]/*[local-name()="title" and not(@*) and following-sibling::*[local-name()='subTitle' and text()]]
    (concatenate title and subtitle, separate by a dot, and then calculate total length and truncate / add full title field as discussed above)

  3. even more complicated - multiple titleInfo per level
    there are a total of 5 objects in CWRC that contain a titleInfo element that is preceded by another titleInfo sibling but doesn’t have a type attribute. (//*[local-name()='titleInfo' and preceding-sibling::*[local-name()="titleInfo"] and not(@type)])
    These, along with the titles that are typed ‘alternative’ (//*[local-name()='titleInfo' and @type='alternative']) should go in the corresponding alternative title field. About 4 alternative titles have subtitles as well, so those should be concatenated like all the other title/subtitle pairs -no need to test for # of characters since it’s not the main title field and can exceed 253
    There are also 1560 instances of @type='abbreviated' , which should also be mapped to an alternative title field.

  4. titleInfo with nonSort children (2,595 objects): concatenate the nonSort content with the title content no need to fiddle with capitalization, as for the title values I have seen, the capitalization is consistent with the title language conventions. count length and truncate if need

  5. 620 descendants of titleInfo are enclosed in TEI elements - @ilovan to add a "Display title" field with full HTML formatting and provide mappings for TEI elements.

To Dos:

  • check if we can have one or more alternative titles.
  • handle modCollection better
  • handle nonSort (concat with title)
  • check titleInfor types to see if all handled properly (some are in place but I'm not certain of coverage)
  • check usage and type attributes
  • check relatedItem containing descendant relatedItem
  • general checks to verify the current work (see test_column_title.xquery - currently filtering on orlando namespace)

Spreadsheet with mappings and objects inventory:



No one assigned


    No labels
    No labels


    No type


    No projects


    No milestone


    None yet


    No branches or pull requests

    Issue actions