Description
Consider the following:
import markdown
md = '''
Here are some elements:
* url <http://example.org>
* repo url <ssh://example.org>, which is a non-HTTP URL
* and <urn:foo> is something else
* ssh url2 <ssh:me@example.org>, handled as an email address
* misc element <em>boo!</em>
'''
converter = markdown.Markdown()
print(converter.convert(md))
This renders as
<p>Here are some elements:</p>
<ul>
<li>url <a href="http://example.org">http://example.org</a></li>
<li>repo url <ssh://example.org>, which is a non-HTTP URL</li>
<li>and <urn:foo> is something else</li>
<li>ssh url2 <a href="mailto:ssh:me@example.org">ssh:me@example.org</a>, handled as an email address</li>
<li>misc element <em>boo!</em></li>
</ul>
I think items number 2 and 3 are incorrect, (a) because the behaviour doesn't match two significant Markdown specs, and (b) because they are both invalid XML (yes, <urn:foo>
looks like an XML element with a namespace prefix; let's not go there...).
The autolink feature in the Daring Fireball spec is ‘for URLs and email addresses’ (though the only URL in that example is an HTTP URL). The corresponding section in the CommonMark spec says that the autolink should happen for an absolute URI. So the second case should be turned into <a href='ssh://example.org'>ssh://example.org</a>
.
What appears to be happening, instead, is that this is being interpreted as literal HTML. The relevant section of Gruber's spec is rather vague, but the corresponding part of the CommonMark spec says that this should happen only to ‘[t]ext between < and > that looks like an HTML tag’, which of course <ssh://example.org>
doesn't (CommonMark: ‘A tag name consists of an ASCII letter followed by zero or more ASCII letters, digits, or hyphens (-)’).
Independently of any spec, however, having <ssh://example.org>
appear in the output means that that output is syntactically invalid, and I feel this shouldn't happen for any input, however insane.
Suggestion:
- When
<starttag>
consists of something other than[a-zA-Z][a-zA-Z0-9-]*
, then it is either a URI, in which case it should be turned into an<a>
element, or it is not, in which case it should be included literally in the output, as if the content were instead enclosed in backticks.
This would imply that item 3 should render as <code>urn:foo</code>
.