Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added xmpp and mailto support to the autoprefixer extension #274

Merged
merged 6 commits into from
Aug 25, 2022

Conversation

stevenlaidlaw
Copy link

cmark-gfm's autlinker extension detects email addresses and automatically converts them into mailto: links. The xmpp protocol also contains addresses that look semantically identical to email addresses, and so are being mislinked as emails.

Example markdown and output:

me@domain.tld

mailto:me@domain.tld

xmpp:me@domain.tld

xmpp:me@domain.tld/join
<p><a href="mailto:me@domain.tld">me@domain.tld</a></p>
<p>mailto:<a href="mailto:me@domain.tld">me@domain.tld</a></p>
<p>xmpp:<a href="mailto:me@domain.tld">me@domain.tld</a></p>
<p>xmpp:<a href="mailto:me@domain.tld">me@domain.tld</a>/join</p>

Instead the desired output would be as so:

<p><a href="mailto:me@domain.tld">me@domain.tld</a></p>
<p><a href="mailto:me@domain.tld">mailto:me@domain.tld</a></p>
<p><a href="xmpp:me@domain.tld">xmpp:me@domain.tld</a></p>
<p><a href="xmpp:me@domain.tld/join">xmpp:me@domain.tld/join</a></p>

This allows both xmpp and mailto protocols to be specified directly, and the autolinker should be smart enough to handle both of those, and ignore any other protocols so we don't break any existing functionality (such as someone typing Send an email to:me@domain.tld, for example).

The changes have been made and tests updated to reflect the change in spec.

@wooorm
Copy link

wooorm commented Aug 18, 2022

Should the loop break instead of continue on a protocol? It looks like this would accept several: mailto:aaaxmpp:aaa@stuff.com.

@wooorm
Copy link

wooorm commented Aug 18, 2022

Most constructs are parsed “normally”, such as the protocols here, instead of postprocessing the text like emails here.
Perhaps these could also be parsed that way?
The benefit of doing that, is that such a protocol is unlikely to occur in the wild as “normal text”, so what follows can be parsed more leniently. Email addresses without a protocol are more prone to yielding false positives, so they have to be stricter.

@stevenlaidlaw
Copy link
Author

Should the loop break instead of continue on a protocol? It looks like this would accept several: mailto:aaaxmpp:aaa@stuff.com.

No, the protocols only work when surrounded by non-alphanumeric characters so the above example would result in the following:

<p>mailto:aaaxmpp:<a href="mailto:aaa@stuff.com">aaa@stuff.com</a></p>

Most constructs are parsed “normally”, such as the protocols here, instead of postprocessing the text like emails here.
Perhaps these could also be parsed that way?

Unfortunately that's not possible here due to the way the autolinker finds email addresses. We could catch the protocols there and apply them, yes, but it would then also apply the automatic mailto on top of it when it matches on the two emails that now exist.

We need to handle the email-like protocols within the email section of the autolinker specifically to prevent this from happening.

@wooorm
Copy link

wooorm commented Aug 19, 2022

I don’t understand your second point. Postprocessing text, where email detection happens, excludes links. It doesn’t link, say, [example@example.com](#) either. As I understand the code, autolink literals with protocol or www are also tagged as that node type. I believe you’re saying that they would interfere, but I don’t see how?

@stevenlaidlaw
Copy link
Author

@wooorm Ah so it does. Good catch, I'll look into implementing those changes. Thanks!

@stevenlaidlaw
Copy link
Author

@wooorm I spent some time over the past two days exploring moving the protocol validation out of the postprocess function and into the match. Unfortunately this would be a much larger change for very little gain as what we have here now works.

I was already at double the code length just replicating the validation, and we'd have to write the email matching again from scratch (or at least pull it out into it's own function). Where it currently sits the email matching is already happening, so the simplest way to get this working is as I've currently made the change.

I do agree it would be nice to do it that way you're suggesting, but the added development time isn't worth it when the current process handles the new protocol additions perfectly already.

@wooorm
Copy link

wooorm commented Aug 24, 2022

Huh, weird that it is so complex! I thought you’d have to do very little validation. Because, when prefixed with www., it already works:

www.me@domain.tld

www.mailto:me@domain.tld

www.xmpp:me@domain.tld

www.xmpp:me@domain.tld/join

www.me@domain.tld

www.mailto:me@domain.tld

www.xmpp:me@domain.tld

www.xmpp:me@domain.tld/join

...So I’d imagine that it would only be adding xmpp: and mailto: where http://, https://, and ftp:// are happening now, and then switching to the rest of www_match.

@wooorm
Copy link

wooorm commented Aug 24, 2022

It sounds like you are trying to implement the more strict proper parsing that happens for emails, where’s I’m thinking more: because there’s such an unlikely-to-occur-in-prose protocol already, it can go straight to check_domain (with allow_short, just like www), to essentially accept any non-whitespace character.

Alas, I’d hope that this loose www like matching works. But I get the time constraint.

@stevenlaidlaw
Copy link
Author

The problem there is that an email isn't just a domain, so it's not so simple to just use that function. The domain part works for everything after the @ in the XMPP example, but / is not valid as part of a MAILTO domain. It also doesn't account for everything before and including the @ symbol, including special characters allowed in email addresses and the protocol itself.

Either way it's more a question of "where" the code should sit, and not really functionality. As this currently stands the code works in the many varied test cases I've provided, and so I don't know that there is much more to be gained by refactoring this out to use match instead of postprocess.

I do appreciate the feedback through and it did give me a chance to explore other options which is always a positive.

@stevenlaidlaw stevenlaidlaw merged commit 0578e1e into master Aug 25, 2022
@stevenlaidlaw stevenlaidlaw deleted the feature/add-xmpp-support branch August 25, 2022 00:56
@UziTech
Copy link

UziTech commented Aug 25, 2022

Will this be added to the autolink spec?

@wooorm
Copy link

wooorm commented Aug 25, 2022

I second that it is very important to update the spec for these things.
And more generally: improve the spec, a lot of things are not explained.
Referencing also #270.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants