Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the colon use in href by an underscore #240

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

jasminlapalme
Copy link

@jasminlapalme jasminlapalme commented Jul 29, 2022

The footnote links in html generated files are broken on some browser like Firefox. The cause of the problem is the colon used in the HREF. In the standard, the RFC3986 says that we cannot use the colon in HREF (section 2.2). The colon is in the reserved characters:

2.2. Reserved Characters

URIs include components and subcomponents that are delimited by
characters in the "reserved" set. These characters are called
"reserved" because they may (or may not) be defined as delimiters by
the generic syntax, by each scheme-specific syntax, or by the
implementation-specific syntax of a URI's dereferencing algorithm.
If data for a URI component would conflict with a reserved
character's purpose as a delimiter, then the conflicting data must be
percent-encoded before the URI is formed.

  reserved    = gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

The purpose of reserved characters is to provide a set of delimiting
characters that are distinguishable from other data within a URI.
URIs that differ in the replacement of a reserved character with its
corresponding percent-encoded octet are not equivalent. Percent-
encoding a reserved character, or decoding a percent-encoded octet
that corresponds to a reserved character, will change how the URI is
interpreted by most applications. Thus, characters in the reserved
set are protected from normalization and are therefore safe to be
used by scheme-specific and producer-specific algorithms for
delimiting data subcomponents within a URI.

This PR change all the use of the colon (:) in HREF to the underscore (_).

The RFC3986 says that we cannot use the colon in HREF (section 2.2)
@fletcher
Copy link
Owner

  1. I do not claim to be an expert in the relevant RFCs by any stretch.

  2. In trying to read the RFC, I don't see where colons are prohibited. They seem to be allowed, but there use may indicate a delimiter between components of a URI depending on scheme/implementation.

  3. My "go-to" is the HTML validator at https://validator.w3.org/nu, which shows the current syntax as valid. Is there another validator that you use that can identify the current syntax as an error?

This syntax has been in place for quite a while (at least since MultiMarkdown v5, and likely earlier -- I don't have an old version installed and immediately accessible, but Babelmark 2 includes v 5.1.0 and 6.3.0). I have not heard of any issues with footnotes until today, so obviously I'm curious as to why this never came up before.

Thanks for any additional information you can provide!

@jasminlapalme
Copy link
Author

Thanks for your quick response. I've added the rest of the paragraph where it lists the reserved characters. After, it clearly mentions this.

The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI.

I know this is a very specific problem, but we came up with this problem with a couple of times in our tests.

@Dalzhim
Copy link

Dalzhim commented Jul 29, 2022

document.querySelector("#fn:3")
Uncaught DOMException: Document.querySelector: '#fn:3' is not a valid selector

The above JavaScript snippet will fail to look up an anchor that uses the colon in its identifier because the set of characters allowed for a selector is quite limited, as specified here : https://www.w3.org/TR/CSS21/syndata.html#characters

In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit, two hyphens, or a hyphen followed by a digit. Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B&W?" or "B\26 W\3F".

I believe this is why you haven't had any issue with the current way of naming identifiers. It is because most use cases will work just fine. However, whenever using CSS or document.querySelector with these identifiers, it becomes problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants