-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework the URI parser with Angstrom #142
Rework the URI parser with Angstrom #142
Conversation
This diff reworks the current parser based on regexes in favor of one that uses Angstrom. It also removes the dependency on `ocaml-re`, moving the regular expressions to a new package `uri-re`. This happens to fix mirage#21.
288206f
to
7c619f0
Compare
Interesting work. I would like to now that you translated already available regexp to |
@dinosaure This is mostly based on what the previous implementation did. I was reading the spec along with the work and noticed that we don't follow it too closely in some places. My rationale for preserving the same assumptions was that it's already a fairly big change, and I didn't want to introduce more noise than necessary. I think this sets up a good base to iterate on the parts that need to follow the spec more cloesly. What do you think? |
I agree with you but |
Thanks for the excellent work on this PR! I'm just getting back from vacation and can review it more over the next week. I'm in favour of merging and releasing an Angstrom version that is as close to the behaviour of |
I've pushed avsm/ocaml-uri@4bf0535 over to https://github.com/avsm/ocaml-uri/tree/angstrom-parser that implements the differential fuzzer. So far, it's only found one difference: the handling of |
This is great! I can add you to my fork if you wanna push there. I can look into fixing that difference soon too. |
pushed :) |
Any further thoughts on that difference Antonio? It would be good to get this into a release |
Thanks for the ping -- I had forgotten about this but I'm still interested in it. Give me a couple days and I'll look into this. |
Much appreciated @anmonteiro! No rush of course :-) |
@avsm Just pushed a commit that fixes the differences in newline handling between the old (regex) and new (Angstrom) parsers. Also added a compatibility test. |
Thank you for working on this @anmonteiro! |
Let's revive this PR. |
CHANGES: * sexp: use the sexplib v0.13 ppx directives (@avsm, mirage/ocaml-uri#143). * rework the URI parser with `angstrom` (@anmonteiro, review @avsm & @dinosaure, mirage/ocaml-uri#142). * add simple fuzzer tests between `angstrom` parser and _legacy_ parser (with `re.posix`, mirage/ocaml-uri#142) * add support of modifying pct encoding (with a custom one) (@orbitz, review @anmonteiro, @tmcgilchrist, @avsm & @dinosaure, mirage/ocaml-uri#147) * allow the selection of generic set of safe characters (with `Generic`) (@madroach, review @dinosaure, mirage/ocaml-uri#141)
CHANGES: * sexp: use the sexplib v0.13 ppx directives (@avsm, mirage/ocaml-uri#143). * rework the URI parser with `angstrom` (@anmonteiro, review @avsm & @dinosaure, mirage/ocaml-uri#142). * add simple fuzzer tests between `angstrom` parser and _legacy_ parser (with `re.posix`, mirage/ocaml-uri#142) * add support of modifying pct encoding (with a custom one) (@orbitz, review @anmonteiro, @tmcgilchrist, @avsm & @dinosaure, mirage/ocaml-uri#147) * allow the selection of generic set of safe characters (with `Generic`) (@madroach, review @dinosaure, mirage/ocaml-uri#141)
This diff reworks the current parser based on regexes in favor of one
that uses Angstrom.
It also removes the dependency on
ocaml-re
, moving the regularexpressions to a new package
uri-re
.The current state of this work aready passes the unit tests, but it might need to further work to get it into a mergeable state. I'm opening a PR to get any and all feedback. Thanks in advance!
This happens to fix #21.