-
Notifications
You must be signed in to change notification settings - Fork 181
Closed
Description
HTTP.jl uses the http_parser_parse_url function to parse URLs.
Line 161 in 6ee7083
| function http_parser_parse_url(url::String) |
I believe this code is based on ngx_http_parse.c from NGINX. @quinnj is that right?
I recently added some more URI parsing tests based on https://github.com/cweb/url-testing/blob/master/urls.json and in the process of debugging made a simple regex pattern based on the regex from RFC 3986.
It turns out that the simple regex parser is faster than http_parser_parse_url.
Running test/uri_benchmark.jl shows that the regex parser runs in 47% of the time taken by http_parser_parse_url:
3.058562 seconds (19.64 M allocations: 748.444 MiB, 2.00% gc time)
http_parser_parse_url parsed 204 urls 10000 times in 3059.0 ms
1.436758 seconds (18.69 M allocations: 1.159 GiB, 6.28% gc time)
regex_parse parsed 204 urls 10000 times in 1437.0 ms (47.0%)
The regex parser is in URIs.jl here:
Lines 101 to 121 in 6ee7083
| const uri_reference_regex = | |
| r"""^ | |
| (?: ([^:/?#]+) :) ? # 1. sheme | |
| (?: // (?: ([^/?#@]*) @) ? # 2. userinfo | |
| (?| (?: \[ ([^\]]+) \] ) # 3. host (ipv6) | |
| | ([^:/?#\[]*) ) # 3. host | |
| (?: : ([^/?#]+) ) ? ) ? # 4. port | |
| ([^?#]*) # 5. path | |
| (?: \?([^#]*) ) ? # 6. query | |
| (?: [#](.*) ) ? # 7. fragment | |
| $"""x | |
| const empty = SubString("", 1, 0) | |
| function regex_parse(::Type{URI}, str::AbstractString) | |
| m = match(uri_reference_regex, str) | |
| if m == nothing | |
| return emptyuri | |
| end | |
| return URI(str, (c = m[1]) == nothing ? empty : c, |
EricForgy
Metadata
Metadata
Assignees
Labels
No labels