Skip to content

Updates to attributes table parser needed as of December 2024 #6

@golightlyb

Description

@golightlyb

Here are a few issues with parsing the current spec that have been highlighted by @abhillman's work.

These issues need to be addressed for the updated machine-readable spec to be fully useful.

Current output

These are the generated json files where the current parser generates incorrect output at times.

Issues

Outdated workaround

Remove the workaround in parse.py line 91

Update global attributes

List of global attributes needs updating in parse.py line 34.

This can be done manually for now, but it would be nice to be able to parse this automatically in future.

handling "the empty string" as an attribute keyword

When parsing attributes, in keyword lists such as "true"; "false"; the empty string, the text "the empty string" is causing the list of keywords to not match the regular expression. Instead, it should be recognised, and the empty string should be emitted as a value_keywords entry of "".

This leads to suboptimal output, for example in attributes.json line 614:

    "hidden":
    {
        "desc": "Whether the element is relevant",
        "elements": ["HTML"],
        "value_keywords": [],
        "value_type": "\"until-found\"; \"hidden\"; the empty string"
    },

should read instead

    "hidden":
    {
        "desc": "Whether the element is relevant",
        "elements": ["HTML"],
        "value_keywords": ["", "until-found", "hidden"],
        "value_type": "Keywords"
    },

Parenthesis in attribute elements parsed correctly

See WHATWG Attributes

In attributes.json, the attribute element list for height requires parsing the HTML text canvas; embed; iframe; img; input; object; source (in picture); video.

Currently it is parsing like so:

    "height":
    {
        "desc": "Vertical dimension",
        "elements":
        [
            "(in",
            "canvas",
            "embed",
            "iframe",
            "img",
            "input",
            "object",
            "video"
        ],
        "value_keywords": [],
        "value_type": "Valid non-negative integer"
    },

The elements array should instead read, with "(in" removed and "source" added:

        "elements":
        [
            "canvas",
            "embed",
            "iframe",
            "img",
            "input",
            "object",
            "source",
            "video"
        ],

value_type should probably also have ".The actual rules are more complicated than indicated" appended.

Attribute keyword list chokes on trailing semicolon

attributes.json line1108 fails to record correct keywords for the popover attribute due to a trailing semicolon, which should be ignored.

It currently reads

    "popover":
    {
        "desc": "Makes the element a popover element",
        "elements": ["HTML"],
        "value_keywords": [],
        "value_type": "\"auto\"; \"manual\";"
    },

But should read

    "popover":
    {
        "desc": "Makes the element a popover element",
        "elements": ["HTML"],
        "value_keywords": ["auto", "manual"],
        "value_type": "Keywords"
    },

Intellectual property notice updates

COPYING.txt (which is copied into the JSON) should be updated - in particular there is a new version of the W3C document license that needs linking to. This should also be updated in COPYING.md.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions