Skip to content

titles handling #14

Open
Open
@ilovan

Description

  1. simplest scenario: only one titleInfo per level (main or related item), with no qualifying attributes and no subtitle element (by my calculation - 94,123 titles):
    //*[local-name()="relatedItem"]/*[local-name()="titleInfo" and not(@*)]/*[local-name()="title" and not(@*) and not(following-sibling::*[local-name()='subTitle'])]
    (check for length - if over 252 characters - truncate and add full_title corresponding field)

  2. slightly more complicated: only one titleInfo per level (main or related item), with no qualifying attributes and a subTitle child (by my calculation - 3,748 titles)
    //*[local-name()="titleInfo" and not(@*)]/*[local-name()="title" and not(@*) and following-sibling::*[local-name()='subTitle' and text()]]
    (concatenate title and subtitle, separate by a dot, and then calculate total length and truncate / add full title field as discussed above)

  3. even more complicated - multiple titleInfo per level
    there are a total of 5 objects in CWRC that contain a titleInfo element that is preceded by another titleInfo sibling but doesn’t have a type attribute. (//*[local-name()='titleInfo' and preceding-sibling::*[local-name()="titleInfo"] and not(@type)])
    These, along with the titles that are typed ‘alternative’ (//*[local-name()='titleInfo' and @type='alternative']) should go in the corresponding alternative title field. About 4 alternative titles have subtitles as well, so those should be concatenated like all the other title/subtitle pairs -no need to test for # of characters since it’s not the main title field and can exceed 253
    There are also 1560 instances of @type='abbreviated' , which should also be mapped to an alternative title field.

  4. titleInfo with nonSort children (2,595 objects): concatenate the nonSort content with the title content no need to fiddle with capitalization, as for the title values I have seen, the capitalization is consistent with the title language conventions. count length and truncate if need

  5. 620 descendants of titleInfo are enclosed in TEI elements - @ilovan to add a "Display title" field with full HTML formatting and provide mappings for TEI elements.

To Dos:

  • check if we can have one or more alternative titles.
  • handle modCollection better
  • handle nonSort (concat with title)
  • check titleInfor types to see if all handled properly (some are in place but I'm not certain of coverage)
  • check usage and type attributes
  • check relatedItem containing descendant relatedItem
  • general checks to verify the current work (see test_column_title.xquery - currently filtering on orlando namespace)

Spreadsheet with mappings and objects inventory: https://docs.google.com/spreadsheets/d/1S-TYcNnv3g8EQPUwqbJDVO5xpDwIHVTL/edit#gid=2097076917

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions