Skip to content

Commit

Permalink
Replace linkRegex with xurls library (#6261)
Browse files Browse the repository at this point in the history
* Replace linkRegex with xurls library

Rather than maintaining a complicated regex to match URLs for
autolinking, gitea can use this existing go library that takes care of
the matching with very little code change to gitea itself. After
spending a while trying to find the perfect regex for all cases this library
still works better as it is more flexible than a single regex ever will be.

This will also fix the following issues: #5844 #3095 #3381

This passes all our current tests and I've added new ones mentioned in
those issues as well.

* Use xurls.StrictMatchingScheme instead of xurls.Strict

This is much faster and we only care about https? links to preserve
existing behavior.
  • Loading branch information
mrsdizzie authored and techknowlogick committed Mar 7, 2019
1 parent 01bd1fc commit f2de5dc
Show file tree
Hide file tree
Showing 9 changed files with 2,038 additions and 3 deletions.
9 changes: 9 additions & 0 deletions Gopkg.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Gopkg.toml
Original file line number Diff line number Diff line change
Expand Up @@ -113,3 +113,7 @@ ignored = ["google.golang.org/appengine*"]
[[constraint]]
name = "github.com/prometheus/client_golang"
version = "0.9.0"

[[constraint]]
name = "github.com/mvdan/xurls"
version = "2.0.0"
5 changes: 2 additions & 3 deletions modules/markup/html.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import (
"code.gitea.io/gitea/modules/util"

"github.com/Unknwon/com"
"github.com/mvdan/xurls"
"golang.org/x/net/html"
"golang.org/x/net/html/atom"
)
Expand Down Expand Up @@ -64,9 +65,7 @@ var (
// https://html.spec.whatwg.org/multipage/input.html#e-mail-state-(type%3Demail)
emailRegex = regexp.MustCompile("[a-zA-Z0-9.!#$%&'*+\\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*")

// matches http/https links. used for autlinking those. partly modified from
// the original present in autolink.js
linkRegex = regexp.MustCompile(`(?:(?:http|https):\/\/(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+(?:\.|[\-;:&=\+\$,\w]+@)[A-Za-z0-9\.\-]+)(?:(?:\/[\+~%\/\.\w\-]*)?\??(?:[\-\+:=&;%@\.\w]*)#?(?:[\.\!\/\\\w]*))?`)
linkRegex, _ = xurls.StrictMatchingScheme("https?://")
)

// regexp for full links to issues/pulls
Expand Down
9 changes: 9 additions & 0 deletions modules/markup/html_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,15 @@ func TestRender_links(t *testing.T) {
test(
"http://142.42.1.1/",
`<p><a href="http://142.42.1.1/" rel="nofollow">http://142.42.1.1/</a></p>`)
test(
"https://github.com/go-gitea/gitea/?p=aaa/bbb.html#ccc-ddd",
`<p><a href="https://github.com/go-gitea/gitea/?p=aaa/bbb.html#ccc-ddd" rel="nofollow">https://github.com/go-gitea/gitea/?p=aaa/bbb.html#ccc-ddd</a></p>`)
test(
"https://en.wikipedia.org/wiki/URL_(disambiguation)",
`<p><a href="https://en.wikipedia.org/wiki/URL_(disambiguation)" rel="nofollow">https://en.wikipedia.org/wiki/URL_(disambiguation)</a></p>`)
test(
"https://foo_bar.example.com/",
`<p><a href="https://foo_bar.example.com/" rel="nofollow">https://foo_bar.example.com/</a></p>`)

// Test that should *not* be turned into URL
test(
Expand Down
27 changes: 27 additions & 0 deletions vendor/github.com/mvdan/xurls/LICENSE

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

299 changes: 299 additions & 0 deletions vendor/github.com/mvdan/xurls/schemes.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit f2de5dc

Please sign in to comment.