Skip to content

u- parsing should always do relative URL resolution #10

Closed
@Zegnat

Description

@Zegnat

This question is separate from but affects #9.

Currently the parsing description for u- properties is as follows:

  • if a.u-x[href] or area.u-x[href], then get the href attribute
  • else if img.u-x[src] or audio.u-x[src] or video.u-x[src] or source.u-x[src], then get the src attribute
  • else if video.u-x[poster], then get the poster attribute
  • else if object.u-x[data], then get the data attribute
  • if there is a gotten value, return the normalized absolute URL of it, following the containing document's language's rules for resolving relative URLs (e.g. in HTML, use the current URL context as determined by the page, and first element, if any).
  • else parse the element for the value-class-pattern. If a value is found, return it.
  • else if abbr.u-x[title], then return the title attribute
  • else if data.u-x[value] or input.u-x[value], then return the value attribute
  • else return the textContent of the element after removing all leading/trailing whitespace and nested <script> & <style> elements.

Note that URL normalisation is applied on the fifth point. Values gained from VCP, abbr, data, or input are never normalised. Is this really correct?

I ran into an issue here when implementing a partial feed. In this case I did not want the feed title to link to itself as that made no sense in relation to the surrounding HTML. Thus I opted for data instead of a:

<div class="h-feed" id="partial-feed">
  <h2 class="p-name"><data class="u-url" value="#partial-feed">Partial Feed</data></h2></div>

However, because data[value] is never normalised, I am forced to write an absolute URL in there. That will hurt portability of the code.

I also think it is bad for input based values. My reasoning here is that a microformats editor should be able to use the same parsing algorithm on the editing and on the output. But if someone writes #fragment in an input-element text field the algorithm will output #fragment, and if this is converted to an a-element on save the same algorithm will output https://example.com/#fragment.

I propose moving the 5th point (“if there is a gotten value, return the normalized absolute URL […]”) as far down the list as possible. Is there any reason why for specific elements this should not be done? I am not sure of abbr but can’t come up with any abbr.u-x use-cases either.

If people can come up with good reasons why outputs for u- properties should not always be normalised on VCP and abbr I still propose to move the data/input case to be above the normalisation step.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions