Updates to attributes table parser needed as of December 2024

Here are a few issues with parsing the current spec that have been highlighted by @abhillman's work.

These issues need to be addressed for the updated machine-readable spec to be fully useful.

## Current output

These are the generated json files where the current parser generates incorrect output at times.

- [attributes.json](https://github.com/tawesoft/html5spec/blob/777c406218c1466d5c5bfcc1824a019a7d96e68b/spec-json/attributes.json)

## Issues

### Outdated workaround

Remove the workaround in [parse.py line 91](https://github.com/tawesoft/html5spec/blob/8157a368d21bd469ac54e0ea4833c8f242ca4b6a/parse.py#L91)

### Update global attributes

[List of global attributes](https://html.spec.whatwg.org/multipage/dom.html#global-attributes) needs updating in [parse.py line 34](https://github.com/tawesoft/html5spec/blob/8157a368d21bd469ac54e0ea4833c8f242ca4b6a/parse.py#L34).

This can be done manually for now, but it would be nice to be able to parse this automatically in future.

### handling "the empty string" as an attribute keyword

When [parsing attributes](https://github.com/tawesoft/html5spec/blob/8157a368d21bd469ac54e0ea4833c8f242ca4b6a/parse.py#L111), in keyword lists such as `"true"; "false"; the empty string`, the text "the empty string" is causing the list of keywords to not match the regular expression. Instead, it should be recognised, and the empty string should be emitted as a `value_keywords` entry of "".

This leads to suboptimal output, for example in [attributes.json line 614](https://github.com/tawesoft/html5spec/blob/777c406218c1466d5c5bfcc1824a019a7d96e68b/spec-json/attributes.json#L614):

```json
    "hidden":
    {
        "desc": "Whether the element is relevant",
        "elements": ["HTML"],
        "value_keywords": [],
        "value_type": "\"until-found\"; \"hidden\"; the empty string"
    },
```

should read instead

```json
    "hidden":
    {
        "desc": "Whether the element is relevant",
        "elements": ["HTML"],
        "value_keywords": ["", "until-found", "hidden"],
        "value_type": "Keywords"
    },
```

### Parenthesis in attribute elements parsed correctly

See [WHATWG Attributes](https://html.spec.whatwg.org/multipage/indices.html#attributes-3)

In [attributes.json](https://github.com/tawesoft/html5spec/blob/777c406218c1466d5c5bfcc1824a019a7d96e68b/spec-json/attributes.json#L592), the attribute element list for height requires parsing the HTML text ```canvas; embed; iframe; img; input; object; source (in picture); video```.

Currently it is parsing like so:

```json
    "height":
    {
        "desc": "Vertical dimension",
        "elements":
        [
            "(in",
            "canvas",
            "embed",
            "iframe",
            "img",
            "input",
            "object",
            "video"
        ],
        "value_keywords": [],
        "value_type": "Valid non-negative integer"
    },
```

The elements array should instead read, with "(in" removed and "source" added:

```json
        "elements":
        [
            "canvas",
            "embed",
            "iframe",
            "img",
            "input",
            "object",
            "source",
            "video"
        ],
```

`value_type` should probably also have ".The actual rules are more complicated than indicated" appended.

### Attribute keyword list chokes on trailing semicolon

[attributes.json line1108](https://github.com/tawesoft/html5spec/blob/777c406218c1466d5c5bfcc1824a019a7d96e68b/spec-json/attributes.json#L1108C27-L1108C28) fails to record correct keywords for the `popover` attribute due to a trailing semicolon, which should be ignored.

It currently reads

```json
    "popover":
    {
        "desc": "Makes the element a popover element",
        "elements": ["HTML"],
        "value_keywords": [],
        "value_type": "\"auto\"; \"manual\";"
    },
```

But should read

```json
    "popover":
    {
        "desc": "Makes the element a popover element",
        "elements": ["HTML"],
        "value_keywords": ["auto", "manual"],
        "value_type": "Keywords"
    },
```


### Intellectual property notice updates

[COPYING.txt](https://github.com/tawesoft/html5spec/blob/master/COPYING.txt) (which is copied into the JSON) should be updated - in particular there is a new version of the W3C document license that needs linking to. This should also be updated in COPYING.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Updates to attributes table parser needed as of December 2024 #6

Current output

Issues

Outdated workaround

Update global attributes

handling "the empty string" as an attribute keyword

Parenthesis in attribute elements parsed correctly

Attribute keyword list chokes on trailing semicolon

Intellectual property notice updates

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Updates to attributes table parser needed as of December 2024 #6

Description

Current output

Issues

Outdated workaround

Update global attributes

handling "the empty string" as an attribute keyword

Parenthesis in attribute elements parsed correctly

Attribute keyword list chokes on trailing semicolon

Intellectual property notice updates

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions