Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse complicated url problem #366

Closed
lufeihaidao opened this issue Mar 13, 2014 · 13 comments · Fixed by #1135
Closed

Parse complicated url problem #366

lufeihaidao opened this issue Mar 13, 2014 · 13 comments · Fixed by #1135
Labels
category: links L1 - broken Valid usage causes incorrect output OR a crash AND there is no known workaround for the issue

Comments

@lufeihaidao
Copy link

In github, [Ajax](http://en.wikipedia.org/wiki/Ajax_(programming)) will be correctly parsed: Ajax, but marked will add unnecessary )

@chjj
Copy link
Member

chjj commented Apr 28, 2014

Marked follows original mardown.pl behavior for this. Libupskirt/sundown also shares the same behavior.

If we're supposed to allow parentheses in urls, what happens when parentheses are incorrectly closed?

[foo](http://example.com/foo_)bar)

What should happen here? The only way to parse () is by keeping state and track of how many levels deep of nested parentheses you're in, but there are no rules for URLs. URLs might provide misnested parens, etc.

What about:

[foo](http://example.com/foo_(bar)

Should marked keep parsing the text after "bar)" since it hasn't seen a proper closing paren for the link syntax?

How about this:

[foo](http://example.com/foo_(((((((((bar))

Is marked even supposed to guess what is going on there by interpreting the parens in that url?

I feel like this change would only exist to accommodate wikipedia links, which can already either be encoded, or included as link references.

If someone can come up with a definite outline of what should happen in all these other cases, I'm all for it.

@dedalozzo
Copy link

I got your point. Since you must have a line break, or a space after an url, or another url, or still you have reached the end of the document, you can simply consider the last parenthesis, like the parenthesis that closes the link.

[foo](http://example.com/foo_(bar)

In the above example, you should have the following link: http://example.com/foo_(bar. Using this approach every wikipedia link just works as is. You just consider the open and the close round brackets, ignoring all the others inside the link. I'm gonna post this bug in Hoedown too, Sundown is a dead project.

@JCMais
Copy link

JCMais commented Apr 28, 2014

I just saw this while reading the source code from hoedown, https://github.com/hoedown/hoedown/blob/master/src/autolink.c#L102

It's from the autolink feature, but somewhat related, isn't?

@dedalozzo
Copy link

I have just known that the bug was fixed in Hoedown long time ago. Yeah, I suppose you right.

@scottgonzalez
Copy link
Contributor

I got your point. Since you must have a line break, or a space after an url, or another url, or still you have reached the end of the document, you can simply consider the last parenthesis, like the parenthesis that closes the link.

That would break links at the end of a parenthetical, which is common enough to be concerning.

@Ernest0x
Copy link

Perhaps you could also support using double quotes inside the parenthesis and take everything inside the quotes as the url:

[foo]("http://example.com/foo_(bar")

@scottgonzalez
Copy link
Contributor

@chjj I think it's time to close this and just tell people to manually encode the URLs using %29 instead of ).

@scottgonzalez
Copy link
Contributor

Actually, you'll need to implement paren matching logic for GFM. It looks like GitHub always requires paired parens inside a URL.

@chjj
Copy link
Member

chjj commented May 15, 2014

@scottgonzalez, I've told people that before. It doesn't seem very effective. This is probably the second issue on this. GFM does do it, and I think it's unintelligent, but I'll try it and see how it goes. Maybe it won't hurt performance much and I'll be okay with it despite disagreeing with it philosophically.

@dedalozzo
Copy link

You can't ask people to encode urls, because they simply copy and paste urls from the address bar of a browser. :-(

-Filippo

On May 15, 2014, at 2:19 PM, Scott González wrote:

@chjj I think it's time to close this and just tell people to manually encode the URLs using %29 instead of ).


Reply to this email directly or view it on GitHub.

@sn3p
Copy link

sn3p commented Feb 29, 2016

Encoding URLs is not an option :(
This is a show stopper for us, and I have the feeling this won't be fixed any time soon...
Guess it's time to switch library. What would be a good alternative?

@dedalozzo
Copy link

I'm using Hoedown, a PHP extension to the Hoedown library.

@joshbruce
Copy link
Member

#984

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: links L1 - broken Valid usage causes incorrect output OR a crash AND there is no known workaround for the issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants