Fix URL parser for parentheses and Chinese Characters#43
Fix URL parser for parentheses and Chinese Characters#43singularity-s0 wants to merge 8 commits intoCretezy:masterfrom
Conversation
|
May I suggest adding a test to reproduce that? |
|
|
||
| void main() { | ||
| print(linkify("Made by https://cretezy.com person@example.com")); | ||
| print(linkify( |
There was a problem hiding this comment.
Hey, mind reverting this and adding a test instead?
|
Sorry, I'm quite new to GitHub and it looks like I'm not quite aware of how PRs and tests work. The last 5 "experimental" commits were not meant for merging. I was trying to find the best Regex that meets the need of my forum (where users may post links). Although I've found the best Regex for my app, it may not be the best for general purposes and thus might not be suitable for merging. Since this is only a simple Regex change, I suggest closing this PR and manually "merge" this commit, if needed. As for tests, since I'm not familiar with how it works..., could you please provide instructions on how to add one? Sorry for troubling you... |
|
Hey, thanks for the contribution! I could do that, or if you wish I can walk you through bring up your PR to something we can merge together (and you learn!). Let me know! |
|
Thank you very much for your kind offer :) Where should we begin? |
|
After experimenting with the URL parser, I came up with several ideas: Unexpected Unicode CharactersCurrently, an URL element will end only when there is a whitespace character matched by According to RFC3986, encoded URLs may only contain the following characters: which translates to (not sure why but it looks like the syntax is a little different in Dart, why does This will prevent unexpected unicode characters from being matched into our LinkElement. ParenthesesParentheses appear as sub-delimiters in RFC3986, so they are technically allowed as a part of the URL. But in practical context they are often confused with the surrounding text (e.g. in Markdown, where links always ends with |
Currently, URLs like
http://foo.com)你好will be interpreted as is. Instead, it should be separated intohttp://foo.comand)你好(according to URL conventions).Although parentheses are by definition allowed in URLs, they usually appear encoded. So parentheses that appear at the end of a URL are, for most of the time, not part of it.
Chinese characters are simply not allowed in URLs.
This PR fixes those issues.