Description
I brought the checklist up to the top, for visibility:
- change the import semantics to define chaining local imports onto remote
imports in terms of RFC3986 relative reference resolution; (Use RFC3986 section 5 URL resolution algorithm #593, Simpler way of incorporating RFC3986 resolution #603) - change
dhall.abnf
to allow any RFC3986-compliant URL, in addition to the
currently-allowed URLs with path components. This will need some care, but
it will not be a breaking change because we are only allowing new URLs, not
removing or changing the meaning of existing URLs. (It is helpful here that
the double quote character"
is not a valid character in RFC3986). (Allow all RFC3986-compliant URLs #604) - change
dhall format
to prefer percent-encoded URLs over quoted-path URLs (dhall format
should prefer percent-encoded URL paths instead of quoted paths dhall-haskell#1109 and [dhall format] Prefer unquoted URLs dhall-haskell#1235) - deprecate non-RFC3986 URLs (including quoted-path URLs) (Add deprecation plan for URLs with quoted paths #913)
- after a suitable deprecation period, remove non-RFC3986 URLs from the
language
Original proposal below:
Before #127, Dhall accepted standard URLs as defined by RFC 3986. #127 changed
URL paths to have the same syntax as local paths. The stated reason for this
seems to be that we want to be able to host the same set of Dhall files on local
disk or remote server, and have them work the same.
#205 shows that some confusion arises as a result of not allowing all standard URLs.
I see two main problems with the current solution:
First, it means URLs in Dhall are surprisingly different from URLs everywhere
else. As an example, I particularly dislike how dhall format
converts
percent-encoded URLs (a thing I understand and expect from other contexts) to
quoted paths (a thing I do not understand in a URL context, because it is not
valid anywhere else). I find it a jarring experience, and it dents my confidence
about whether I know how to format URLs the way Dhall expects, and whether
the URLs I put into Dhall mean what I think they mean.
Second, it assumes that URL path segments and local path segments have the same
semantics, when they do not. In particular, percent-encoding means something in
URLs but does not mean anything in local paths. To see this, imagine a file
foo.dhall
containing a local import ./foo%20bar.dhall
. If foo.dhall
is
a local file, this will open a file whose literal name is foo%20bar.dhall
; but
if foo.dhall
is a remote file, it will make a web request for
https://.../foo%20bar.dhall
which will open a file whose literal name is foo bar.dhall
.
This means that, in some cases, moving a set of files from local to remote (or vice versa) will break imports.
EDIT: this is incorrect, see Gabriel's comment below
(There is also a very minor point that we do not currently accept some valid
URLs, beyond the types listed at the top of this issue. As an example,
http://example.com/(foo) is a valid URL that Dhall will not parse.)
In my view, the correct solution to this is to define a way to turn a local
import into a relative URI, and resolve it against the appropriate base URI.
Resolving a relative reference is defined in RFC3986 section 5; we should reuse
that prior art rather than reinvent it ourselves. Languages should already have
support for resolving relative URIs. (Indeed, this is how dhall-golang
implements import chaining; it does not naively implement the judgment rules in
the spec, but rather calls the Go standard library method
net.url/URL#ResolveReference()
).
In the above example, to chain the import ./foo%20bar.dhall
onto a remote
import, we would first convert the path to a relative reference
./foo%2520bar.dhall
(where the percent character has itself been
percent-encoded), resolve that reference with the appropriate base URI, make the
network request, and the server would then correctly interpret the
foo%2520bar.dhall
path segment as a request for a file called
foo%20bar.dhall
. Similarly, to chain the import ./"foo bar.dhall"
, we would
convert to a relative reference ./foo%20bar.dhall
, resolve relative to a base
URI, make the request, and the server would correctly interpret the path segment
as a request for the file foo bar.dhall
.
Once we have done this, the syntax of remote URLs does not have to match local
paths at all. Then we can return to the happy familiarity of true
RFC3986-compliant URLs.
My proposal is thus:
- change the import semantics to define chaining local imports onto remote
imports in terms of RFC3986 relative reference resolution; - change
dhall.abnf
to allow any RFC3986-compliant URL, in addition to the
currently-allowed URLs with path components. This will need some care, but
it will not be a breaking change because we are only allowing new URLs, not
removing or changing the meaning of existing URLs. (It is helpful here that
the double quote character"
is not a valid character in RFC3986). - change
dhall format
to prefer percent-encoded URLs over quoted-path URLs - deprecate non-RFC3986 URLs (including quoted-path URLs)
- after a suitable deprecation period, remove non-RFC3986 URLs from the
language
At this point, dhall URL syntax will just be standard URL syntax; but dhall
files will still be relocatable between local and remote without breaking
things. I think this is the best of all worlds.