Skip to content

Next version #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 70 commits into from
Mar 20, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
176af66
Make Sceheme requred for non-relative URIs
garyb Jan 26, 2018
4356849
WIP
garyb Jan 29, 2018
15b8078
WIP
garyb Jan 29, 2018
df02b9e
WIP
garyb Jan 29, 2018
b843e7f
Split "nonstandard" types out
garyb Jan 29, 2018
8b316de
Record for parser/printer components
garyb Jan 29, 2018
aeae895
Free hosts from being an array
garyb Jan 29, 2018
c3decaf
Use functions for component parsing
garyb Jan 30, 2018
f528b7a
Remove some junk from Common, avoid regex parsers (mostly)
garyb Jan 30, 2018
372b3af
Style consistency for imports
garyb Jan 30, 2018
8fbf064
Delete old test for now-deleted `match1`
garyb Jan 30, 2018
14273b0
Add host/port parameters, introduce some encoded-value types
garyb Feb 1, 2018
97534e6
Introduce encoded-value type for Query
garyb Feb 1, 2018
9730451
Split up tests, update path representation
garyb Feb 3, 2018
e5e7b11
More tests, fix some re-exports
garyb Feb 5, 2018
c917296
Update to use preferred test/parse libraries
garyb Feb 5, 2018
2fed7b5
Some progress on valid handling of `,` & multi-host support
garyb Feb 6, 2018
0386d26
Fix warning
garyb Feb 6, 2018
3da1466
Fully parameterise host/port parsing
garyb Feb 6, 2018
49d3168
Safe construction for a few more values
garyb Feb 6, 2018
9a15157
Restore `QueryPairs` extra structure
garyb Feb 6, 2018
d5a5f33
Introduce `MultiHostPortPair`
garyb Feb 6, 2018
25aa142
Merge remote-tracking branch 'slamdata/master' into next
garyb Feb 6, 2018
281a512
Add extra for `user:password` handling in `UserInfo`
garyb Feb 7, 2018
33468dd
use NonEmptyString for RegName and PathSegmentNZ
safareli Feb 27, 2018
2c177fd
use crashWith for Scheme
safareli Feb 27, 2018
17c1408
add Semigroup and Monoid for Path
safareli Feb 27, 2018
3fba380
use NonEmptyString for unsafeFromString
safareli Feb 27, 2018
dcafee8
use NonEmptyString for PathSegmentNZNC
safareli Feb 27, 2018
36c51c0
update unsafe usages
safareli Feb 27, 2018
802dd78
Add crashes for illegal unsafe port/ipv4 address constructions
garyb Feb 28, 2018
4614557
Fix illegal port in tests
garyb Feb 28, 2018
619ed07
Move namespace to just URI
garyb Feb 28, 2018
308c2ee
Add details about the new setup to the README
garyb Mar 1, 2018
c6b8cec
Add doc comments for Scheme / use NES internally
garyb Mar 1, 2018
e26df37
Use NES for UserInfo internally, decouple parser composition
garyb Mar 1, 2018
c4323cc
Decouple parser composition from component parsers
garyb Mar 1, 2018
678bbbe
Clean up NES usage in UserInfo
garyb Mar 1, 2018
4d993d7
Use NonEmptyString for RegName
garyb Mar 1, 2018
56975f1
Simplify tests slightly with `nes` helper
garyb Mar 1, 2018
b11ee9a
Add fragment doc comments
garyb Mar 1, 2018
9cbc350
Add query doc comments
garyb Mar 1, 2018
df188b5
Tweak wording and add examples in UserInfo
garyb Mar 1, 2018
35d0658
Add doc comments for RegName, and print function
garyb Mar 1, 2018
b4843b0
Doc comments for IPv4Address
garyb Mar 1, 2018
a6dfbc2
Add more `Gen`s, doc comments, tests
garyb Mar 1, 2018
2d2f264
Avoid redefining some token parsers & rename some
garyb Mar 2, 2018
0015d5a
Last few doc comments for common
garyb Mar 2, 2018
daae752
Authority doc comments, remove unused synonym
garyb Mar 2, 2018
23df58e
Wording consistency tweak
garyb Mar 2, 2018
78119e5
Doc comments for HostPortPair
garyb Mar 2, 2018
98e99ad
Port doc comments
garyb Mar 2, 2018
64478e6
Tweak examples of authority option types usage
garyb Mar 2, 2018
5d81aa9
HierarchicalPart doc comments
garyb Mar 2, 2018
6316dd0
Newtype instance for UserPassInfo
safareli Mar 2, 2018
f868610
Path doc comments (and added TODO for NEA note)
garyb Mar 2, 2018
8dbdf12
Make `Path` match path-abempty from the spec
garyb Mar 2, 2018
ae42bc5
Doc comments for path varieties
garyb Mar 2, 2018
2d3a801
Adjust some segment stuff for consistency
garyb Mar 2, 2018
131b869
The -> a
garyb Mar 2, 2018
18365fd
Path segment doc comments
garyb Mar 2, 2018
d3cc178
Remove `';'` from `keyPartChar`
safareli Mar 16, 2018
4fb7a71
use in QueryPairs tests
safareli Mar 16, 2018
58a08b7
Add common URI schemes
garyb Mar 19, 2018
5dd71e8
Scheme `fromString` should fail if input is not fully consumed
garyb Mar 19, 2018
8047c1b
Make `pctEncoded` return a `NonEmptyString`
garyb Mar 19, 2018
7e612b3
More doc comment changes
garyb Mar 19, 2018
ed35231
Export char parsers used for printEncoded, etc
garyb Mar 20, 2018
a0ee260
Fix accidental loss of `try` in `URIRef` parser
garyb Mar 20, 2018
4aec4c5
Doc comments for the extras
garyb Mar 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 133 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,138 @@ A type-safe parser, printer, and ADT for URLs and URIs based on [RFC 3986](http:
bower install purescript-uri
```

## Documentation
## Getting started

The types and names here are a fairly faithful representation of the components described in the spec.

- [`URI`][URI] is for absolutely specified URIs that can also have path, query, and fragment (hash) parts.
- [`AbsoluteURI`][AbsoluteURI] is a variation on `URI` that drops the ability for the URI to carry a fragment.
- [`RelativeRef`][RelativeRef] is for relatively specified URIs that can also have path, query, and fragment (hash) parts.
- [`URIRef`][URIRef] is combination of `URI` and `RelativeRef`, allowing the full range of representable URIs.

The absolute/relative terminology when applied to URIs does not relate to the paths that a URI may carry, it refers to whether the URI has a "scheme" or not. For example `http://example.com` and `file://../test.txt` are absolute URIs but `//example.com` and `/test.txt` are relative.

Assuming none of the `unsafe`-prefixed functions are used when constructing a URI, it should be impossible to construct a URI that is invalid using the types this library provides*. The slight downside of this is the data structures are relatively complex so as to only admit correct possibilities.

\* Actually, there is one exception to that - `IPv6Address` is far too forgiving in what it allows currently. Contributions welcome!

### URI component representations

Due to the differing needs of users of this library, the URI types are all parameterised to allow for custom representations to be used for parts of the URI. Take a look at the most heavily parametrised type, `URIRef`:

``` purescript
type URIRef userInfo hosts path hierPath relPath query fragment = ...
```

This allows us to provide hooks into the parsing and printing processes for a URI, so that types better suited to the intended use case can be used.

Taking `userInfo` as an example, according to the spec, the `user-info` part of an authority is just an arbitrary string of characters terminated by an `@` before a hostname. An extremely common usage for this is the `user:password` scheme, so by leaving the choice of representation as a type variable we can switch it out for a type specifically designed to handle that (this library includes one actually, under [`URI.Extra.UserPassInfo`][UserPassInfo]).

### App-specific URI type definitions

When using this library, you'll probably want to define type synonyms for the URIs that make sense for your use case. A URI type that uses the simple representations for each component will look something like this:

``` purescript
type MyURI = URIRef UserInfo (HostPortPair Host Port) Path HierPath RelPath Query Fragment
```

Along with these types, you'll want to define an options record that specifies how to parse and print URIs that look like this:

``` purescript
options ∷ Record (URIRefOptions UserInfo (HostPortPair Host Port) Path HierPath RelPath Query Fragment)
options =
{ parseUserInfo: pure
, printUserInfo: id
, parseHosts: HostPortPair.parser pure pure
, printHosts: HostPortPair.print id id
, parsePath: pure
, printPath: id
, parseHierPath: pure
, printHierPath: id
, parseRelPath: pure
, printRelPath: id
, parseQuery: pure
, printQuery: id
, parseFragment: pure
, printFragment: id
}
```

As you can see by all the `pure` and `id`, we're not doing a whole lot here. `parseHosts` is a bit of an exception, but that's just due to the way that case is handled (see [later in this README](#host-parsing) for more details about that).

These types ([`UserInfo`][UserInfo], [`HostPortPair`][HostPortPair], [`Host`][Host], etc.) are all provided by the library, and where necessary can only be constructed via smart constructor. This ensures that percent-encoding is applied to characters where necessary to ensure the constructed values will print as valid URIs, and so on.

If we decided that we wanted to support `user:password` style user-info, we'd modify this by changing our type to use [`UserPassInfo`][UserPassInfo]:

``` purescript
type MyURI = URIRef UserPassInfo (HostPortPair Host Port) Path HierPath RelPath Query Fragment
```

And update our options to use the appropriate parse/print functions accordingly:

``` purescript
options ∷ Record (URIRefOptions UserPassInfo (HostPortPair Host Port) Path HierPath RelPath Query Fragment)
options =
{ parseUserInfo: UserPassInfo.parse
, printUserInfo: UserPassInfo.print
, ...
```

### Writing custom component types

These `parse/print` functions all share much the same shape of signature. For the case in the previous example, they come out as:

``` purescript
parseUserInfo ∷ UserInfo → Either URIPartParseError UserPassInfo
printUserInfo ∷ UserPassInfo → UserInfo
```

So you can see that for each component, when the options hooks/custom representation stuff is used, we take one of these library-provided component types and parse it into our new representation, and also print it back to that simple type later.

Each of the library-provided component types have a `toString` function that extracts the inner value as a string after applying percent-decoding, and an `unsafeToString` that provides exactly the value that was parsed, preserving percent decoding. Similarly, there's a `fromString` that performs the minimal amount of required percent encoding for that part of the URI, and an `unsafeFromString` that performs no encoding at all.

You may ask why it's ever useful to have access to the encoded values, or to be able to print without encoding, so here's a motivating example:

For the [`UserPassInfo`][UserPassInfo] example, the typical way of encoding a username or password that contains a colon within it is to use `%3A` (`us:er` becomes `us%3Aer`). This allows colons-within-the-values to be recongised as independent from the colon-separating-username-and-password (`us%3Aer:password`).

According to the spec it is not a requirement to encode colons in this part of the URI scheme, so just using [`toString`][UserInfo.toString] on `us:er` will get us back a `us:er`, resulting in `us:er:password`, so we'd have no way of knowing where the user ends and where the password starts.

The solution when printing is to do some custom encoding that also replaces `:` with `%3A` for the user/password parts, and then joins them with the unencoded `:` afterwards. If we constructed the resulting [`UserInfo`][UserInfo] value with [`fromString`][UserInfo.fromString] it would re-encode our already encoded user/password parts (giving us `%253A` instead of `%3A`), so we use [`unsafeFromString`][UserInfo.unsafeFromString] since we've done the encoding ourselves.

Similarly, when parsing these values back, we want to split on `:` and then percent-decode the user/password parts individually, so we need to use [`unsafeToString`][UserInfo.unsafeToString] to ensure we get the encoded version.

Another example where this sort of thing might be useful is if you would like to encode/decode spaces in paths as `+` rather than `%20`. Having the ability to hook into the parse/print stage and choose to examine or print with or without percent encoding/decoding applied gives us the flexibility to produce and consume values exactly as we want, rather than the library attempting to know best in all cases.

### Host parsing

The host printing/parsing setup is a little different. This is to accommodate something that lies outside of the RFC 3986 spec: multiple host definitions within a URI. The motivating case for this is things like connection strings for MongoDB, where host/port pairs can be defined separated by commas within a single URI:

```
mongodb://db1.example.net:27017,db2.example.net:2500/?replicaSet=test
```

This doesn't jive with what is said in RFC 3986, as there a comma is allowed as part of a hostname, but the multiple ports don't fit into the schema. To get around this, when it comes to parsing hosts, the parsing is entirely handed over to the `parseHosts` parser in the options (in the cases for the other parameters, a normal function is run on a value that has been parsed according to the spec already).

For normal URIs the [`HostPortPair`][HostPortPair] parser/printer should serve well enough. This accepts functions to deal with the host/port parts allowing for those aspects to be dealt with much like all the other options.

For URIs that are like the MongoDB connection string, this library provides [`URI.Extra.MultiHostPortPair`][MultiHostPortPair]. Given that both of these allow for custom `Host` / `Port` types, hopefully nobody else will need to write anything for the general host-section-parsing part!

## Further documentation

[The tests](test/) contain many examples of URI constructions using the basic types this library provides.

Module documentation is [published on Pursuit](http://pursuit.purescript.org/packages/purescript-uri).

[AbsoluteURI]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.AbsoluteURI
[Host]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.Host
[HostPortPair]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.HostPortPair
[MultiHostPortPair]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.Extra.MultiHostPortPair
[RelativeRef]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.RelativeRef
[URI]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.URI
[URIRef]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.URIRef
[UserInfo.fromString]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.UserInfo#v:fromString
[UserInfo.toString]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.UserInfo#v:toString
[UserInfo.unsafeFromString]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.UserInfo#v:unsafeFromString
[UserInfo.unsafeToString]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.UserInfo#v:unsafeToString
[UserInfo]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.UserInfo
[UserPassInfo]: https://pursuit.purescript.org/packages/purescript-uri/docs/URI.Extra.UserPassInfo
13 changes: 7 additions & 6 deletions bower.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,18 @@
"package.json"
],
"dependencies": {
"purescript-arrays": "^4.3.0",
"purescript-generics-rep": "^5.2.0",
"purescript-globals": "^3.0.0",
"purescript-integers": "^3.0.0",
"purescript-maps": "^3.0.0",
"purescript-pathy": "^4.0.0",
"purescript-string-parsers": "^3.0.0",
"purescript-parsing": "^4.3.1",
"purescript-profunctor-lenses": "^3.7.0",
"purescript-unfoldable": "^3.0.0",
"purescript-generics-rep": "^5.2.0",
"purescript-profunctor-lenses": "^3.7.0"
"purescript-these": "^3.0.0"
},
"devDependencies": {
"purescript-test-unit": "11.0.0",
"purescript-quickcheck": "^4.4.0"
"purescript-quickcheck": "^4.4.0",
"purescript-spec": "^2.0.0"
}
}
31 changes: 0 additions & 31 deletions src/Data/URI.purs

This file was deleted.

75 changes: 0 additions & 75 deletions src/Data/URI/AbsoluteURI.purs

This file was deleted.

67 changes: 0 additions & 67 deletions src/Data/URI/Authority.purs

This file was deleted.

Loading