Skip to content

ListItem in content-tree is too restrictive vs XML <li> (content like ImageSet/Video present in data store) #136

@lokendersinghft

Description

@lokendersinghft

Problem

XML allows <li> to contain embedded (incl. ImageSet/Video).

In the current XML schema, <li> (type Li) can contain both paragraph blocks and embedded content nodes:

    <xs:complexType name="Li" mixed="true">
        <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element name="p" type="PType" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="content" type="Content" minOccurs="0" maxOccurs="unbounded"/>
        </xs:choice>
    </xs:complexType>

ContentType includes

    <xs:simpleType name="ContentType">
        <xs:restriction base="xs:string">
            <xs:enumeration value="http://www.ft.com/ontology/content/ImageSet"/>
            <xs:enumeration value="http://www.ft.com/ontology/content/Video"/>
            <xs:enumeration value="http://www.ft.com/ontology/content/Content"/>
            <xs:enumeration value="http://www.ft.com/ontology/content/Article"/>
            <xs:enumeration value="http://www.ft.com/ontology/content/ClipSet"/>
            <xs:enumeration value="http://www.ft.com/ontology/content/CustomCodeComponent"/>
        </xs:restriction>
    </xs:simpleType>

This means the following is valid XML:

<body>
  <ul>
    <li>
      <content data-embedded="true"
               type="http://www.ft.com/ontology/content/ImageSet"
               id="23925844-4d8d-11ea-0bc6-d44b54b3bebc" />
    </li>
  </ul>
</body>

content-tree ListItem only allows phrasing nodes and Link

"anyOf": [
                            {
                                "$ref": "#/definitions/ContentTree.transit.Paragraph"
                            },
                            {
                                "$ref": "#/definitions/ContentTree.transit.Text"
                            },
                            {
                                "$ref": "#/definitions/ContentTree.transit.Break"
                            },
                            {
                                "$ref": "#/definitions/ContentTree.transit.Strong"
                            },
                            {
                                "$ref": "#/definitions/ContentTree.transit.Emphasis"
                            },
                            {
                                "$ref": "#/definitions/ContentTree.transit.Strikethrough"
                            },
                            {
                                "$ref": "#/definitions/ContentTree.transit.Link"
                            }
                        ]
                    },

This prevents representing list items that contain content like ImageSet or Video.

Impact

We cannot perform a lossless conversion of some existing XML content into content-tree.

We have identified at least:

  • 663 ImageSet occurrences in <li> (example UUID: 7c0cafbc-f195-481a-b00e-ac7cc4cd8b4c)
  • 17 Video occurrences in <li> (example UUID: 3249049c-c2e7-11e7-b2bb-322b2cb39656)

Notes:

Legacy Content and Article types are transformed into a Link node.
for example, the below XML

<body>
    <ul>
        <li>
            <content data-embedded="true" type="http://www.ft.com/ontology/content/Content"
                id="23925844-4d8d-11ea-0bc6-d44b54b3bebc" />
        </li>
        <li>
            <content data-embedded="true" type="http://www.ft.com/ontology/content/Article"
                id="23925844-4d8d-11ea-0bc6-d44b54b3bebc" />
        </li>
    </ul>
</body>

would be converted to

{
    "type": "root",
    "body": {
        "type": "body",
        "children": [
            {
                "type": "list",
                "children": [
                    {
                        "type": "list-item",
                        "children": [
                            {
                                "type": "link",
                                "children": [],
                                "title": "",
                                "url": "https://www.ft.com/content/23925844-4d8d-11ea-0bc6-d44b54b3bebc"
                            }
                        ]
                    },
                    {
                        "type": "list-item",
                        "children": [
                            {
                                "type": "link",
                                "children": [],
                                "title": "",
                                "url": "https://www.ft.com/content/23925844-4d8d-11ea-0bc6-d44b54b3bebc"
                            }
                        ]
                    }
                ],
                "ordered": false
            }
        ],
        "version": 1
    }
}

The problematic cases are ImageSet (and to a lesser extent Video) embedded directly in list items.

Possible Solution:

Expand ContentTree.transit.ListItem.children to include block/embed nodes such as:

  • ImageSet
  • Video

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions