-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Problem
XML allows <li> to contain embedded (incl. ImageSet/Video).
In the current XML schema, <li> (type Li) can contain both paragraph blocks and embedded content nodes:
<xs:complexType name="Li" mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="p" type="PType" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="content" type="Content" minOccurs="0" maxOccurs="unbounded"/>
</xs:choice>
</xs:complexType>
ContentType includes
<xs:simpleType name="ContentType">
<xs:restriction base="xs:string">
<xs:enumeration value="http://www.ft.com/ontology/content/ImageSet"/>
<xs:enumeration value="http://www.ft.com/ontology/content/Video"/>
<xs:enumeration value="http://www.ft.com/ontology/content/Content"/>
<xs:enumeration value="http://www.ft.com/ontology/content/Article"/>
<xs:enumeration value="http://www.ft.com/ontology/content/ClipSet"/>
<xs:enumeration value="http://www.ft.com/ontology/content/CustomCodeComponent"/>
</xs:restriction>
</xs:simpleType>
This means the following is valid XML:
<body>
<ul>
<li>
<content data-embedded="true"
type="http://www.ft.com/ontology/content/ImageSet"
id="23925844-4d8d-11ea-0bc6-d44b54b3bebc" />
</li>
</ul>
</body>
content-tree ListItem only allows phrasing nodes and Link
"anyOf": [
{
"$ref": "#/definitions/ContentTree.transit.Paragraph"
},
{
"$ref": "#/definitions/ContentTree.transit.Text"
},
{
"$ref": "#/definitions/ContentTree.transit.Break"
},
{
"$ref": "#/definitions/ContentTree.transit.Strong"
},
{
"$ref": "#/definitions/ContentTree.transit.Emphasis"
},
{
"$ref": "#/definitions/ContentTree.transit.Strikethrough"
},
{
"$ref": "#/definitions/ContentTree.transit.Link"
}
]
},
This prevents representing list items that contain content like ImageSet or Video.
Impact
We cannot perform a lossless conversion of some existing XML content into content-tree.
We have identified at least:
- 663 ImageSet occurrences in
<li>(example UUID: 7c0cafbc-f195-481a-b00e-ac7cc4cd8b4c) - 17 Video occurrences in
<li>(example UUID: 3249049c-c2e7-11e7-b2bb-322b2cb39656)
Notes:
Legacy Content and Article types are transformed into a Link node.
for example, the below XML
<body>
<ul>
<li>
<content data-embedded="true" type="http://www.ft.com/ontology/content/Content"
id="23925844-4d8d-11ea-0bc6-d44b54b3bebc" />
</li>
<li>
<content data-embedded="true" type="http://www.ft.com/ontology/content/Article"
id="23925844-4d8d-11ea-0bc6-d44b54b3bebc" />
</li>
</ul>
</body>
would be converted to
{
"type": "root",
"body": {
"type": "body",
"children": [
{
"type": "list",
"children": [
{
"type": "list-item",
"children": [
{
"type": "link",
"children": [],
"title": "",
"url": "https://www.ft.com/content/23925844-4d8d-11ea-0bc6-d44b54b3bebc"
}
]
},
{
"type": "list-item",
"children": [
{
"type": "link",
"children": [],
"title": "",
"url": "https://www.ft.com/content/23925844-4d8d-11ea-0bc6-d44b54b3bebc"
}
]
}
],
"ordered": false
}
],
"version": 1
}
}
The problematic cases are ImageSet (and to a lesser extent Video) embedded directly in list items.
Possible Solution:
Expand ContentTree.transit.ListItem.children to include block/embed nodes such as:
- ImageSet
- Video