Added xmpp and mailto support to the autoprefixer extension #274

stevenlaidlaw · 2022-08-18T01:20:13Z

cmark-gfm's autlinker extension detects email addresses and automatically converts them into mailto: links. The xmpp protocol also contains addresses that look semantically identical to email addresses, and so are being mislinked as emails.

Example markdown and output:

me@domain.tld

mailto:me@domain.tld

xmpp:me@domain.tld

xmpp:me@domain.tld/join

<p><a href="mailto:me@domain.tld">me@domain.tld</a></p>
<p>mailto:<a href="mailto:me@domain.tld">me@domain.tld</a></p>
<p>xmpp:<a href="mailto:me@domain.tld">me@domain.tld</a></p>
<p>xmpp:<a href="mailto:me@domain.tld">me@domain.tld</a>/join</p>

Instead the desired output would be as so:

<p><a href="mailto:me@domain.tld">me@domain.tld</a></p>
<p><a href="mailto:me@domain.tld">mailto:me@domain.tld</a></p>
<p><a href="xmpp:me@domain.tld">xmpp:me@domain.tld</a></p>
<p><a href="xmpp:me@domain.tld/join">xmpp:me@domain.tld/join</a></p>

This allows both xmpp and mailto protocols to be specified directly, and the autolinker should be smart enough to handle both of those, and ignore any other protocols so we don't break any existing functionality (such as someone typing Send an email to:me@domain.tld, for example).

The changes have been made and tests updated to reflect the change in spec.

wooorm · 2022-08-18T08:11:55Z

Should the loop break instead of continue on a protocol? It looks like this would accept several: mailto:aaaxmpp:aaa@stuff.com.

wooorm · 2022-08-18T08:28:49Z

Most constructs are parsed “normally”, such as the protocols here, instead of postprocessing the text like emails here.
Perhaps these could also be parsed that way?
The benefit of doing that, is that such a protocol is unlikely to occur in the wild as “normal text”, so what follows can be parsed more leniently. Email addresses without a protocol are more prone to yielding false positives, so they have to be stricter.

stevenlaidlaw · 2022-08-18T22:37:15Z

Should the loop break instead of continue on a protocol? It looks like this would accept several: mailto:aaaxmpp:aaa@stuff.com.

No, the protocols only work when surrounded by non-alphanumeric characters so the above example would result in the following:

<p>mailto:aaaxmpp:<a href="mailto:aaa@stuff.com">aaa@stuff.com</a></p>

Most constructs are parsed “normally”, such as the protocols here, instead of postprocessing the text like emails here.
Perhaps these could also be parsed that way?

Unfortunately that's not possible here due to the way the autolinker finds email addresses. We could catch the protocols there and apply them, yes, but it would then also apply the automatic mailto on top of it when it matches on the two emails that now exist.

We need to handle the email-like protocols within the email section of the autolinker specifically to prevent this from happening.

wooorm · 2022-08-19T08:44:51Z

I don’t understand your second point. Postprocessing text, where email detection happens, excludes links. It doesn’t link, say, [example@example.com](#) either. As I understand the code, autolink literals with protocol or www are also tagged as that node type. I believe you’re saying that they would interfere, but I don’t see how?

stevenlaidlaw · 2022-08-22T06:24:01Z

@wooorm Ah so it does. Good catch, I'll look into implementing those changes. Thanks!

stevenlaidlaw · 2022-08-24T07:36:25Z

@wooorm I spent some time over the past two days exploring moving the protocol validation out of the postprocess function and into the match. Unfortunately this would be a much larger change for very little gain as what we have here now works.

I was already at double the code length just replicating the validation, and we'd have to write the email matching again from scratch (or at least pull it out into it's own function). Where it currently sits the email matching is already happening, so the simplest way to get this working is as I've currently made the change.

I do agree it would be nice to do it that way you're suggesting, but the added development time isn't worth it when the current process handles the new protocol additions perfectly already.

wooorm · 2022-08-24T07:50:56Z

Huh, weird that it is so complex! I thought you’d have to do very little validation. Because, when prefixed with www., it already works:

www.me@domain.tld

www.mailto:me@domain.tld

www.xmpp:me@domain.tld

www.xmpp:me@domain.tld/join

www.me@domain.tld

www.mailto:me@domain.tld

www.xmpp:me@domain.tld

www.xmpp:me@domain.tld/join

...So I’d imagine that it would only be adding xmpp: and mailto: where http://, https://, and ftp:// are happening now, and then switching to the rest of www_match.

wooorm · 2022-08-24T07:58:32Z

It sounds like you are trying to implement the more strict proper parsing that happens for emails, where’s I’m thinking more: because there’s such an unlikely-to-occur-in-prose protocol already, it can go straight to check_domain (with allow_short, just like www), to essentially accept any non-whitespace character.

Alas, I’d hope that this loose www like matching works. But I get the time constraint.

stevenlaidlaw · 2022-08-25T00:53:32Z

The problem there is that an email isn't just a domain, so it's not so simple to just use that function. The domain part works for everything after the @ in the XMPP example, but / is not valid as part of a MAILTO domain. It also doesn't account for everything before and including the @ symbol, including special characters allowed in email addresses and the protocol itself.

Either way it's more a question of "where" the code should sit, and not really functionality. As this currently stands the code works in the many varied test cases I've provided, and so I don't know that there is much more to be gained by refactoring this out to use match instead of postprocess.

I do appreciate the feedback through and it did give me a chance to explore other options which is always a positive.

UziTech · 2022-08-25T01:15:36Z

Will this be added to the autolink spec?

wooorm · 2022-08-25T17:38:30Z

I second that it is very important to update the spec for these things.
And more generally: improve the spec, a lot of things are not explained.
Referencing also #270.

stevenlaidlaw added 4 commits July 14, 2022 13:21

Added xmpp and mailto support to the autoprefixer extension

d0866dd

Simplified logic

cfd56ef

Bugfixes

89b4cd5

Modified size type to prevent errors on windows

055f3c8

stevenlaidlaw self-assigned this Aug 18, 2022

stevenlaidlaw added the enhancement label Aug 18, 2022

Added some clarity around validation of characters

2aa10c9

sgoedecke approved these changes Aug 25, 2022

View reviewed changes

Added more complex test

4c64c2c

stevenlaidlaw merged commit 0578e1e into master Aug 25, 2022

stevenlaidlaw deleted the feature/add-xmpp-support branch August 25, 2022 00:56

wooorm mentioned this pull request Sep 2, 2022

GFM autolink extension (www., https?:// parts): links don’t work when after bracket #278

Open

zeha mentioned this pull request Nov 29, 2022

autolink: avoid out-of-bounds read in validate_protocol #296

Closed

wooorm mentioned this pull request Mar 30, 2023

Suggestion: Support For Mastodon Addresses syntax-tree/mdast-util-gfm-autolink-literal#7

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added xmpp and mailto support to the autoprefixer extension #274

Added xmpp and mailto support to the autoprefixer extension #274

stevenlaidlaw commented Aug 18, 2022

wooorm commented Aug 18, 2022

wooorm commented Aug 18, 2022

stevenlaidlaw commented Aug 18, 2022

wooorm commented Aug 19, 2022

stevenlaidlaw commented Aug 22, 2022

stevenlaidlaw commented Aug 24, 2022

wooorm commented Aug 24, 2022

wooorm commented Aug 24, 2022

stevenlaidlaw commented Aug 25, 2022

UziTech commented Aug 25, 2022

wooorm commented Aug 25, 2022

Added xmpp and mailto support to the autoprefixer extension #274

Added xmpp and mailto support to the autoprefixer extension #274

Conversation

stevenlaidlaw commented Aug 18, 2022

wooorm commented Aug 18, 2022

wooorm commented Aug 18, 2022

stevenlaidlaw commented Aug 18, 2022

wooorm commented Aug 19, 2022

stevenlaidlaw commented Aug 22, 2022

stevenlaidlaw commented Aug 24, 2022

wooorm commented Aug 24, 2022

wooorm commented Aug 24, 2022

stevenlaidlaw commented Aug 25, 2022

UziTech commented Aug 25, 2022

wooorm commented Aug 25, 2022