Skip to content

Improve handling of clobbering #31

Closed
@crazygolem

Description

@crazygolem

Initial checklist

Problem

The way DOM clobbering prevention is implemented, this plugin leaves sanitized HTML broken: it correctly rewrites the attributes that can cause clobbering (id and name, configured by default) and by default some attributes that reference those IDs (ariaDescribedBy and ariaLabelledBy).

Notably missing is the href attribute, which can be used for internal links using a fragment which value matches the id or name attribute of an element. This breaks GFM footnotes (cf. #30, #29) but also internal links such as those targetting headings slugified by rehype-slug.

The current solution has been deemed acceptable because 1) it keeps the internal links "pretty" for the users; and 2) github likes it like that.

I think however that it is not very good, because it requires awareness of the issue otherwise the plugin just appears broken, and by default breaks the document and requires a workaround to be applied later. And the suggested workaround (the github script) also has limitations, like it wouldn't support multiple generated documents being included with different clobberPrefix to avoid collisions.

I would much prefer if the sanitizer wouldn't break my documents by default where there is no reason to. Pretty links are nice, but it should be a conscious decision to enable them as it requires extra effort to un-break the document afterwards.

Solution

I propose to improve the rewriting of attributes that reference another element via their ID as follows:

While visiting nodes:

  1. Prefix the id and name attributes as currently;
  2. Add the original value of the rewritten id and name attributes to a list of targets;
  3. If the node contains an attribute that can reference an ID (e.g. ariaDescribedBy, ariaLabelledBy but also href), add it to a list of references.

Now that the whole tree has been traversed, we know all the IDs in the sanitized document, and can start rewriting the references. Doing it after the traversal instead of during allows to be a bit smarter with the rewriting, and allows (maybe as an option) to keep references untouched when they don't target an ID within the document (if the document is later included into another one, this would allow referencing elements in the enclosing document, e.g. #top for an internal link, or some common aria labels or descriptions).

For the href attribute, I propose to only rewrite them if they contain just the fragment. So for example href="foobar.htm#section-xyz" would not be rewritten, because without extra context we can't know whether foobar.htm is the current page, but href="#section-xyz" would be rewritten to href="#user-content-section-xyz". This would at least fix the broken GFM footnotes, and generally internal links, which I think is a common case when converting from markdown.

This would require to special-case href because the assumption on the format used currently for the two aria attributes doesn't hold. Instead of that, I also propose to slightly change the format of the configuration to allow passing a function to handle the rewriting, which would let users add support for more exotic attributes, such as custom data-* attributes (not that anybody requested such a thing yet) or simply let the user handle the difficult things like the URL fragment not being on its own.

Alternatives

  • Do nothing: If github is happy, everyone's happy 😿
  • Monkey-patching the hast-util-sanitize lib, as mentioned by the reporter of Don't apply the clobber prefix when it's already present #29: Yes, it works, but we're basically all on our own. Since sanitization is security-relevant, I think it would be better to do it once right better. It would also be quite brittle, and could break with any update.
  • Write another plugin just to fix the links: This plugin would have less context to work with than the sanitize plugin, e.g. no idea which IDs have been rewritten (it would need to assume that specifically hast-util-sanitize was used and all IDs have been rewritten). Also an extra walk of the tree. And the plugin's name would be somewhat stupid: hast-util-sanitize-clobbering-fix.
  • Write another plugin to handle just clobbering: This would make the functionality in hast-util-sanitize redundant, while probably ending up lifting a big chunk of the current code, would require configuring another plugin to make it work (i.e. disabling the clobber handling in hast-util-sanitize), extra tree walk, annoying plugin name.

I hope you are at least open to the idea that my proposal can be better than the alternatives. On my part, I am also open to suggestions if my proposition doesn't quite fit the bill. I can implement it, but I can't promise to implement it fast =)

Metadata

Metadata

Assignees

No one assigned

    Labels

    👎 phase/noPost cannot or will not be acted on🤷 no/invalidThis cannot be acted upon

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions