Skip to content

Commit 488afff

Browse files
authored
Merge pull request #43 from garyb/next
Next version
2 parents 293ff97 + 4aec4c5 commit 488afff

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+4386
-1562
lines changed

README.md

Lines changed: 133 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,138 @@ A type-safe parser, printer, and ADT for URLs and URIs based on [RFC 3986](http:
1212
bower install purescript-uri
1313
```
1414

15-
## Documentation
15+
## Getting started
16+
17+
The types and names here are a fairly faithful representation of the components described in the spec.
18+
19+
- [`URI`][URI] is for absolutely specified URIs that can also have path, query, and fragment (hash) parts.
20+
- [`AbsoluteURI`][AbsoluteURI] is a variation on `URI` that drops the ability for the URI to carry a fragment.
21+
- [`RelativeRef`][RelativeRef] is for relatively specified URIs that can also have path, query, and fragment (hash) parts.
22+
- [`URIRef`][URIRef] is combination of `URI` and `RelativeRef`, allowing the full range of representable URIs.
23+
24+
The absolute/relative terminology when applied to URIs does not relate to the paths that a URI may carry, it refers to whether the URI has a "scheme" or not. For example `http://example.com` and `file://../test.txt` are absolute URIs but `//example.com` and `/test.txt` are relative.
25+
26+
Assuming none of the `unsafe`-prefixed functions are used when constructing a URI, it should be impossible to construct a URI that is invalid using the types this library provides*. The slight downside of this is the data structures are relatively complex so as to only admit correct possibilities.
27+
28+
\* Actually, there is one exception to that - `IPv6Address` is far too forgiving in what it allows currently. Contributions welcome!
29+
30+
### URI component representations
31+
32+
Due to the differing needs of users of this library, the URI types are all parameterised to allow for custom representations to be used for parts of the URI. Take a look at the most heavily parametrised type, `URIRef`:
33+
34+
``` purescript
35+
type URIRef userInfo hosts path hierPath relPath query fragment = ...
36+
```
37+
38+
This allows us to provide hooks into the parsing and printing processes for a URI, so that types better suited to the intended use case can be used.
39+
40+
Taking `userInfo` as an example, according to the spec, the `user-info` part of an authority is just an arbitrary string of characters terminated by an `@` before a hostname. An extremely common usage for this is the `user:password` scheme, so by leaving the choice of representation as a type variable we can switch it out for a type specifically designed to handle that (this library includes one actually, under [`URI.Extra.UserPassInfo`][UserPassInfo]).
41+
42+
### App-specific URI type definitions
43+
44+
When using this library, you'll probably want to define type synonyms for the URIs that make sense for your use case. A URI type that uses the simple representations for each component will look something like this:
45+
46+
``` purescript
47+
type MyURI = URIRef UserInfo (HostPortPair Host Port) Path HierPath RelPath Query Fragment
48+
```
49+
50+
Along with these types, you'll want to define an options record that specifies how to parse and print URIs that look like this:
51+
52+
``` purescript
53+
options ∷ Record (URIRefOptions UserInfo (HostPortPair Host Port) Path HierPath RelPath Query Fragment)
54+
options =
55+
{ parseUserInfo: pure
56+
, printUserInfo: id
57+
, parseHosts: HostPortPair.parser pure pure
58+
, printHosts: HostPortPair.print id id
59+
, parsePath: pure
60+
, printPath: id
61+
, parseHierPath: pure
62+
, printHierPath: id
63+
, parseRelPath: pure
64+
, printRelPath: id
65+
, parseQuery: pure
66+
, printQuery: id
67+
, parseFragment: pure
68+
, printFragment: id
69+
}
70+
```
71+
72+
As you can see by all the `pure` and `id`, we're not doing a whole lot here. `parseHosts` is a bit of an exception, but that's just due to the way that case is handled (see [later in this README](#host-parsing) for more details about that).
73+
74+
These types ([`UserInfo`][UserInfo], [`HostPortPair`][HostPortPair], [`Host`][Host], etc.) are all provided by the library, and where necessary can only be constructed via smart constructor. This ensures that percent-encoding is applied to characters where necessary to ensure the constructed values will print as valid URIs, and so on.
75+
76+
If we decided that we wanted to support `user:password` style user-info, we'd modify this by changing our type to use [`UserPassInfo`][UserPassInfo]:
77+
78+
``` purescript
79+
type MyURI = URIRef UserPassInfo (HostPortPair Host Port) Path HierPath RelPath Query Fragment
80+
```
81+
82+
And update our options to use the appropriate parse/print functions accordingly:
83+
84+
``` purescript
85+
options ∷ Record (URIRefOptions UserPassInfo (HostPortPair Host Port) Path HierPath RelPath Query Fragment)
86+
options =
87+
{ parseUserInfo: UserPassInfo.parse
88+
, printUserInfo: UserPassInfo.print
89+
, ...
90+
```
91+
92+
### Writing custom component types
93+
94+
These `parse/print` functions all share much the same shape of signature. For the case in the previous example, they come out as:
95+
96+
``` purescript
97+
parseUserInfo ∷ UserInfo → Either URIPartParseError UserPassInfo
98+
printUserInfo ∷ UserPassInfo → UserInfo
99+
```
100+
101+
So you can see that for each component, when the options hooks/custom representation stuff is used, we take one of these library-provided component types and parse it into our new representation, and also print it back to that simple type later.
102+
103+
Each of the library-provided component types have a `toString` function that extracts the inner value as a string after applying percent-decoding, and an `unsafeToString` that provides exactly the value that was parsed, preserving percent decoding. Similarly, there's a `fromString` that performs the minimal amount of required percent encoding for that part of the URI, and an `unsafeFromString` that performs no encoding at all.
104+
105+
You may ask why it's ever useful to have access to the encoded values, or to be able to print without encoding, so here's a motivating example:
106+
107+
For the [`UserPassInfo`][UserPassInfo] example, the typical way of encoding a username or password that contains a colon within it is to use `%3A` (`us:er` becomes `us%3Aer`). This allows colons-within-the-values to be recongised as independent from the colon-separating-username-and-password (`us%3Aer:password`).
108+
109+
According to the spec it is not a requirement to encode colons in this part of the URI scheme, so just using [`toString`][UserInfo.toString] on `us:er` will get us back a `us:er`, resulting in `us:er:password`, so we'd have no way of knowing where the user ends and where the password starts.
110+
111+
The solution when printing is to do some custom encoding that also replaces `:` with `%3A` for the user/password parts, and then joins them with the unencoded `:` afterwards. If we constructed the resulting [`UserInfo`][UserInfo] value with [`fromString`][UserInfo.fromString] it would re-encode our already encoded user/password parts (giving us `%253A` instead of `%3A`), so we use [`unsafeFromString`][UserInfo.unsafeFromString] since we've done the encoding ourselves.
112+
113+
Similarly, when parsing these values back, we want to split on `:` and then percent-decode the user/password parts individually, so we need to use [`unsafeToString`][UserInfo.unsafeToString] to ensure we get the encoded version.
114+
115+
Another example where this sort of thing might be useful is if you would like to encode/decode spaces in paths as `+` rather than `%20`. Having the ability to hook into the parse/print stage and choose to examine or print with or without percent encoding/decoding applied gives us the flexibility to produce and consume values exactly as we want, rather than the library attempting to know best in all cases.
116+
117+
### Host parsing
118+
119+
The host printing/parsing setup is a little different. This is to accommodate something that lies outside of the RFC 3986 spec: multiple host definitions within a URI. The motivating case for this is things like connection strings for MongoDB, where host/port pairs can be defined separated by commas within a single URI:
120+
121+
```
122+
mongodb://db1.example.net:27017,db2.example.net:2500/?replicaSet=test
123+
```
124+
125+
This doesn't jive with what is said in RFC 3986, as there a comma is allowed as part of a hostname, but the multiple ports don't fit into the schema. To get around this, when it comes to parsing hosts, the parsing is entirely handed over to the `parseHosts` parser in the options (in the cases for the other parameters, a normal function is run on a value that has been parsed according to the spec already).
126+
127+
For normal URIs the [`HostPortPair`][HostPortPair] parser/printer should serve well enough. This accepts functions to deal with the host/port parts allowing for those aspects to be dealt with much like all the other options.
128+
129+
For URIs that are like the MongoDB connection string, this library provides [`URI.Extra.MultiHostPortPair`][MultiHostPortPair]. Given that both of these allow for custom `Host` / `Port` types, hopefully nobody else will need to write anything for the general host-section-parsing part!
130+
131+
## Further documentation
132+
133+
[The tests](test/) contain many examples of URI constructions using the basic types this library provides.
16134

17135
Module documentation is [published on Pursuit](http://pursuit.purescript.org/packages/purescript-uri).
136+
137+
[AbsoluteURI]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.AbsoluteURI
138+
[Host]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.Host
139+
[HostPortPair]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.HostPortPair
140+
[MultiHostPortPair]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.Extra.MultiHostPortPair
141+
[RelativeRef]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.RelativeRef
142+
[URI]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.URI
143+
[URIRef]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.URIRef
144+
[UserInfo.fromString]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.UserInfo#v:fromString
145+
[UserInfo.toString]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.UserInfo#v:toString
146+
[UserInfo.unsafeFromString]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.UserInfo#v:unsafeFromString
147+
[UserInfo.unsafeToString]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.UserInfo#v:unsafeToString
148+
[UserInfo]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.UserInfo
149+
[UserPassInfo]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.Extra.UserPassInfo

bower.json

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,17 +17,18 @@
1717
"package.json"
1818
],
1919
"dependencies": {
20+
"purescript-arrays": "^4.3.0",
21+
"purescript-generics-rep": "^5.2.0",
2022
"purescript-globals": "^3.0.0",
2123
"purescript-integers": "^3.0.0",
2224
"purescript-maps": "^3.0.0",
23-
"purescript-pathy": "^4.0.0",
24-
"purescript-string-parsers": "^3.0.0",
25+
"purescript-parsing": "^4.3.1",
26+
"purescript-profunctor-lenses": "^3.7.0",
2527
"purescript-unfoldable": "^3.0.0",
26-
"purescript-generics-rep": "^5.2.0",
27-
"purescript-profunctor-lenses": "^3.7.0"
28+
"purescript-these": "^3.0.0"
2829
},
2930
"devDependencies": {
30-
"purescript-test-unit": "11.0.0",
31-
"purescript-quickcheck": "^4.4.0"
31+
"purescript-quickcheck": "^4.4.0",
32+
"purescript-spec": "^2.0.0"
3233
}
3334
}

src/Data/URI.purs

Lines changed: 0 additions & 31 deletions
This file was deleted.

src/Data/URI/AbsoluteURI.purs

Lines changed: 0 additions & 75 deletions
This file was deleted.

src/Data/URI/Authority.purs

Lines changed: 0 additions & 67 deletions
This file was deleted.

0 commit comments

Comments
 (0)