-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add document for Unicode casemapping #272
Changes from 2 commits
49c702b
8f48b1c
357cde6
0a605cd
36cfa0c
f1a3082
382c522
0d6435d
fb9dc99
807e084
fa23d59
634fb6d
840aa78
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
--- | ||
title: rfc7700 IRC Casemapping | ||
layout: spec | ||
copyrights: | ||
- | ||
name: "Daniel Oaks" | ||
period: "2016" | ||
email: "daniel@danieloaks.net" | ||
--- | ||
This document describes a unicode-aware casemapping for IRC, based on the recommendations in [RFC 7700](https://tools.ietf.org/html/rfc7700). | ||
|
||
Client and channel names in languages other than English is a much-desired feature. This casemapping allows for an extended set of characters while minimising the risks around allowing these characters, and avoiding the technical limitations of prior solutions to this problem. | ||
|
||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119](http://tools.ietf.org/html/rfc2119). | ||
|
||
|
||
## Casemapping | ||
|
||
The server indicates that this casemapping is in use by advertising the `"CASEMAPPING"` `RPL_ISUPPORT` parameter with the value `"rfc7700"`. An example of this is shown below: | ||
|
||
:irc.example.com 005 dan CASEMAPPING=rfc7700 :are supported by this server" | ||
|
||
When the `rfc7700` casemapping is in use, names (and other messages from the IRCd where possible) MUST be sent using the [UTF-8](https://tools.ietf.org/html/rfc3629) encoding. | ||
|
||
The `rfc7700` casemapping uses the PRECIS [Nickname profile][precis] as defined in [Section 2 of RFC 7700][precis]. | ||
|
||
[precis]: https://tools.ietf.org/html/rfc7700#section-2 | ||
|
||
|
||
### Order of Operations | ||
|
||
Names being prepared MUST apply the following rules in the order shown: | ||
|
||
1. Preperation using the PRECIS [Nickname profile][precis]. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/Preperation/Preparation/ |
||
2. Check for restricted characters. | ||
|
||
These steps MUST happen in the order shown, or else the restricted characters check may miss characters that should be legitimately restricted. | ||
|
||
If a name does contain a restricted character (whether disallowed by the [Nickname profile](https://tools.ietf.org/html/rfc7700#section-2.2) or this document), it MUST be rejected by the server and MUST NOT be propogated to other clients. This is done through the appropriate numeric for the command which tried to set or use the invalid name such as `ERR_ERRONEUSNICKNAME`, `ERR_NOSUCHCHANNEL`, or whichever numeric is most appropriate. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At this point better use the named link syntax: The `rfc7700` casemapping uses the PRECIS [Nickname profile][precis] as defined in [Section 2 of RFC 7700][precis].
[precis]: https://tools.ietf.org/html/rfc7700#section-2 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool, I changed the other links above to use the named link syntax since they were all referring to the same URL. The link here and the link to rfc7700 up the top differ from the others since they're linking to different sections (and the url's only replicated once throughout the doc). |
||
|
||
|
||
### Comparisons | ||
|
||
When comparing user names or channel names to check for equivalency, both strings are casemapped with the above process. After this, they are compared with a standard bit-for-bit comparison. | ||
|
||
|
||
## Restricted Characters | ||
|
||
Certain characters mean special things to the protocol. This section details characters that are not allowed in specific name types. | ||
|
||
|
||
### All Names | ||
|
||
Nicknames, usernames, hostnames and channel names cannot contain the following characters: | ||
|
||
* `(' ', 0x20)` - Separates parameters. | ||
* `(':', 0x3A)` - Separates trailing parameter. | ||
* `(',', 0x2C)` - Used as a separator. | ||
* `('*', 0x2A)` - Used in mask matching. | ||
* `('?', 0x3F)` - Used in mask matching. | ||
|
||
To be clear, these characters will also be repeated below. | ||
|
||
#### Protocol Framing | ||
|
||
These characters are disallowed either by the protocol itself or because of how they are used in protocol framing. These won't be repeated below, as sane implementations either break the message on or disallow the character in streams: | ||
|
||
* `('\0', 0x00)` - Disallowed by the protocol. | ||
* `('\n', 0x0A)` - Used in message framing. | ||
* `('\r', 0x0D)` - Used in message framing. | ||
|
||
|
||
### Nicknames | ||
|
||
Nicknames cannot contain the following characters: | ||
|
||
* `(' ', 0x20)` - Separates parameters. | ||
* `(':', 0x3A)` - Separates trailing parameter. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This & identical entries below seem unnecessary; There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does, but since it's already disallowed and I could see it causing possible confusion with libraries that split parameters strangely, figured it was better to disallow it. If we figure it's not required I can definitely remove it though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might as well forbid There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, that's fair. In that case, I can just note that the first letter of one can't be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it should be fine under the "first character" list. (For channels it's already implied by CHANTYPE.) |
||
* `(',', 0x2C)` - Used as a separator. | ||
* `('*', 0x2A)` - Used in mask matching. | ||
* `('?', 0x3F)` - Used in mask matching. | ||
* `('.', 0x2E)` - Denotes a server name. | ||
* `('!', 0x21)` - Separates username from hostname. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as previously |
||
* `('@', 0x40)` - Separates username from hostname. | ||
|
||
In addition to the above restrictions, the first character of a nickname cannot be: | ||
|
||
* `('-', 0x2D)` - Disallowed. | ||
* `('0', 0x30)` - Disallowed. | ||
* `('1', 0x31)` - Disallowed. | ||
* `('2', 0x32)` - Disallowed. | ||
* `('3', 0x33)` - Disallowed. | ||
* `('4', 0x34)` - Disallowed. | ||
* `('5', 0x35)` - Disallowed. | ||
* `('6', 0x36)` - Disallowed. | ||
* `('7', 0x37)` - Disallowed. | ||
* `('8', 0x38)` - Disallowed. | ||
* `('9', 0x39)` - Disallowed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not allowing numbers as the first char of a nick shouldn't be in the spec for these reasons:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's nothing stopping servers from accepting a subset of nicks allowed by this spec (they can send an invalid nick numeric for any nick they don't like) so servers can still disallow digits if they want but they cannot allow more nicks than what this spec allows. Also clients must be prepared to see nicks starting with digits. |
||
* Channel membership prefixes as defined in the `PREFIX` parameter of `RPL_ISUPPORT` - Protocol conflicts. | ||
* Channel prefixes as defined in the `CHANTYPES` parameter of `RPL_ISUPPORT` - Protocol conflicts. | ||
|
||
|
||
### Usernames | ||
|
||
Usernames cannot contain the following characters: | ||
|
||
* `(' ', 0x20)` - Separates parameters. | ||
* `(':', 0x3A)` - Separates trailing parameter. | ||
* `(',', 0x2C)` - Used as a separator. | ||
* `('*', 0x2A)` - Used in mask matching. | ||
* `('?', 0x3F)` - Used in mask matching. | ||
* `('!', 0x21)` - Separates nickname from username. | ||
* `('@', 0x40)` - Separates username from hostname. | ||
|
||
|
||
### Hostnames | ||
|
||
Hostnames cannot contain the following charactes: | ||
|
||
* `(' ', 0x20)` - Separates parameters. | ||
* `(':', 0x3A)` - Separates trailing parameter. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IPv6 IPs need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They even technically can have it as the first character. Which is a bit problematic for 352 RPL_WHOREPLY and 311 RPL_WHOISUSER. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Servers add a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My mistake, meant to remove those with another change. This has been removed. |
||
* `(',', 0x2C)` - Used as a separator. | ||
* `('*', 0x2A)` - Used in mask matching. | ||
* `('?', 0x3F)` - Used in mask matching. | ||
* `('!', 0x21)` - Separates nickname from username. | ||
* `('@', 0x40)` - Separates username from hostname. | ||
|
||
Hostnames that are looked up on client connection should also be checked with the traditional hostname rules to ensure it is valid. | ||
|
||
|
||
### Channel Names | ||
|
||
Channel names must start with a channel prefix as defined in the `CHANTYPES` parameter of `RPL_ISUPPORT` (005). | ||
|
||
Channel names cannot contain the following characters: | ||
|
||
* `(' ', 0x20)` - Separates parameters. | ||
* `(',', 0x2C)` - Used as a separator. | ||
* `('*', 0x2A)` - Used in mask matching. | ||
* `('?', 0x3F)` - Used in mask matching. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is mask matching in channel names a thing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool, removed those There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We (InspIRCd) use glob matching on channel names in various places. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @SaberUK Example? Do you also forbid those characters in channel names or is there just no way to specify them without accidentally over-globbing? Actually, I just tested and was able to create a channel on Insp with both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jwheare We don't presently forbid them although they are used in various places like e.g. https://github.com/inspircd/inspircd/blob/master/docs/conf/modules.conf.example#L714 This does unfortunately result in some problems like what you mentioned though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is possible to block them, as documented: https://github.com/inspircd/inspircd/blob/insp20/docs/conf/modules.conf.example#L417-L419 |
||
|
||
|
||
## Security Considerations | ||
|
||
New unicode-capable casemappings have a considerable security impact. | ||
|
||
With the large numbers of new characters allowed comes the risk of introducing confusion for users. The PRECIS framework (much like the earlier framework [stringprep](https://tools.ietf.org/html/rfc3454)) aims to avoid this through mapping confusable characters to a single base character, and by allowing specific known-good characters. | ||
|
||
The PRECIS framework represents the most modern standardized solution today for doing this sort of mapping and handling of internationalized names, and should mitigate most of the issues around this. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is a highly misleading statement. Reading Section 12.5 (Security Considerations - Visually Similar Characters) of RFC 7564 it says:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unless I'm mistaken, I believe this would be covered by the rules of the Nickname profile itself here (specifically, 3+4+5). Regardless, I'll have another read over both those documents and probably adjust the text here to make it more clear exactly what I'm referring to, thanks for pointing this out. |
||
|
||
Another issue with allowing such a wide range of characters is the ability for those new characters to break the protocol. The section in this document regarding Restricted Characters aims to prevent this from being an issue. | ||
|
||
Legacy clients may have issues working correctly with unicode names. It is acknowledged that issues may exist around this, and in particular it is very difficult to understand how clients may react to unicode names given the very vast base of clients out there today. I do not attempt to address this concern within this document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unicode is a proper noun so it should be capitalised.