Add document for Unicode casemapping #272

DanielOaks · 2016-09-15T07:42:35Z

Unicode names have been wanted for a while, and used in experimental implementations as well in certain bouncers integrating with other messaging systems.

This document outlines a method based on RFC 7700 which should represent a reasonable, modern solution for those projects that wish to allow unicode characters and casemap them appropriately.

There's previous discussion around this in #259.

This casemapping does not specify any sort of backwards-compatibility measures. Being compatible with clients and servers that cannot correctly handle unicode has been brought up many times during discussions about unicode casemappings. Below outlines some of the most reasonable suggestions, and why I haven't included them in this specification:

Encoding names so non-rfc7700 servers can accept them

This suggestion revolves around the client encoding nicknames and channel names into currently IRC-friendly characters before it sends them to the server (allowing them to be used on every server out there today). When receiving these encoded names, other unicode-aware clients will decode them to their proper unicode counterpart before displaying them.

Pros

Unicode nicknames and channel names can be used on servers that don't natively support unicode.
Non-unicode-aware clients can connect to servers that are unicode-aware.

Cons

Non-unicode-aware servers will allow nicknames nicknames that look like duplicates, due to the encoding required and the server not being able to enforce the name preperation described above.
Possible duplication of names by encoding names that contain only irc-friendly characters (or otherwise, strict client-side checking that is likely to be misinterpreted or misimplemented).

Because of the security implications this would bring up, I think this is an extremely bad idea.

Encoding names so that non-unicode-aware clients can accept them

This suggestion revolves around the server encoding nicknames and channel names into currently IRC-friendly characters before it sends them to the client (allowing them to be accepted by every client out there). When receiving these encoded names, unicode-aware clients will decode them into their proper unicode counterpart before displaying them.

Pros

We can be assured that any client, including ones that can't do unicode, will be able to accept the names.

Cons

Due to the encoding, these encoded nicknames are not going to be easily readable by non-unicode-aware clients, and are going to appear as a blob of unreadable text.
Even if we only encode names that contain special characters, that complicates message sending in ways that's likely going to irritate server authors into not implementing this.
The decoding/encoding required by this (particularly if only certain names are encoded) complicates client programming in ways that is likely to be misimplemented.

I don't think this is required because getting this casemapping widely implemented will take time. By the time this casemapping gets into large enough use to warrant worrying about legacy clients, I think a large majority of the clients currently in use will support unicode names without issues. As well, a number of clients already successfully accept unicode names.

Because of the complexity this process adds and how much I see it as a non-issue at this point this is implemented, I don't think this should happen and believe it's more effort than it's worth I think this measure would just cause more problems than it would solve.

attilamolnar · 2016-09-15T07:59:16Z

documentation/rfc7700.md

+* `(',', 0x2C)` - Used as a separator.
+* `('*', 0x2A)` - Used in mask matching.
+* `('?', 0x3F)` - Used in mask matching.
+* `('!', 0x21)` - Separates username from hostname.


Fix: separates nickname from username

grawity · 2016-09-15T07:55:39Z

documentation/rfc7700.md

+Nicknames cannot contain the following characters:
+
+* `(' ', 0x20)` - Separates parameters.
+* `(':', 0x3A)` - Separates trailing parameter.


This & identical entries below seem unnecessary; : only has special meaning as the first character of a parameter.

It does, but since it's already disallowed and I could see it causing possible confusion with libraries that split parameters strangely, figured it was better to disallow it. If we figure it's not required I can definitely remove it though.

Might as well forbid : in privmsgs, topics, etc. A library blindly splitting on : is not "strange", it's outright buggy.

Hmm, that's fair. In that case, I can just note that the first letter of one can't be :? (since if i.e. a nickname started with : then you wouldn't be able to use it in normal messages)

Yeah, it should be fine under the "first character" list. (For channels it's already implied by CHANTYPE.)

grawity · 2016-09-15T08:00:04Z

documentation/rfc7700.md

+
+These steps MUST happen in the order shown, or else the restricted characters check may miss characters that should be legitimately restricted.
+
+If a name does contain a restricted character (whether disallowed by the [Nickname profile](https://tools.ietf.org/html/rfc7700#section-2.2) or this document), it MUST be rejected by the server and MUST NOT be propogated to other clients. This is done through the appropriate numeric for the command which tried to set or use the invalid name such as `ERR_ERRONEUSNICKNAME`, `ERR_NOSUCHCHANNEL`, or whichever numeric is most appropriate.


At this point better use the named link syntax:

The `rfc7700` casemapping uses the PRECIS [Nickname profile][precis] as defined in [Section 2 of RFC 7700][precis]. [precis]: https://tools.ietf.org/html/rfc7700#section-2

Cool, I changed the other links above to use the named link syntax since they were all referring to the same URL. The link here and the link to rfc7700 up the top differ from the others since they're linking to different sections (and the url's only replicated once throughout the doc).

attilamolnar · 2016-09-15T08:23:03Z

documentation/rfc7700.md

+* `('*', 0x2A)` - Used in mask matching.
+* `('?', 0x3F)` - Used in mask matching.
+* `('.', 0x2E)` - Denotes a server name.
+* `('!', 0x21)` - Separates username from hostname.


Same as previously

…acter of names

attilamolnar · 2016-09-15T09:12:23Z

documentation/rfc7700.md

+Hostnames cannot contain the following charactes:
+
+* `(' ', 0x20)` - Separates parameters.
+* `(':', 0x3A)` - Separates trailing parameter.


IPv6 IPs need : in the hostname (spotted by @jobe1986)

They even technically can have it as the first character. Which is a bit problematic for 352 RPL_WHOREPLY and 311 RPL_WHOISUSER.

Servers add a 0 prefix to IPv6 IPs beginning with : so that's not a problem.

My mistake, meant to remove those with another change. This has been removed.

attilamolnar · 2016-09-15T09:17:37Z

@DanielOaks Could you add some examples, particularly ones that illustrate how comparisons work?

jwheare · 2016-09-15T09:28:06Z

documentation/rfc7700.md

+* `(' ', 0x20)` - Separates parameters.
+* `(',', 0x2C)` - Used as a separator.
+* `('*', 0x2A)` - Used in mask matching.
+* `('?', 0x3F)` - Used in mask matching.


Is mask matching in channel names a thing? * and ? are valid channel characters at the moment, this seems overly restrictive. (spotted by @jobe1986)

Cool, removed those

We (InspIRCd) use glob matching on channel names in various places.

@SaberUK Example? Do you also forbid those characters in channel names or is there just no way to specify them without accidentally over-globbing?

Actually, I just tested and was able to create a channel on Insp with both * and ?. I think the recommendation should probably not go against existing valid characters.

@jwheare We don't presently forbid them although they are used in various places like e.g.

https://github.com/inspircd/inspircd/blob/master/docs/conf/modules.conf.example#L714

This does unfortunately result in some problems like what you mentioned though.

It is possible to block them, as documented: https://github.com/inspircd/inspircd/blob/insp20/docs/conf/modules.conf.example#L417-L419

grawity · 2016-09-15T10:01:50Z

For clarification: Does PRECIS affect only comparisons or display as well? If it affects display, does the PRECIS case-folding rule mean that it's impossible to use mixed-case nicknames (since they get mapped to lowercase)?

attilamolnar · 2016-09-15T10:13:17Z

@grawity As I understand it only affects comparisons, if adopting it meant losing upper case characters in nicks then it would be a step backwards.

attilamolnar · 2016-09-15T10:29:33Z

documentation/rfc7700.md

+* `('6', 0x36)` - Disallowed.
+* `('7', 0x37)` - Disallowed.
+* `('8', 0x38)` - Disallowed.
+* `('9', 0x39)` - Disallowed.


Not allowing numbers as the first char of a nick shouldn't be in the spec for these reasons:

Servers already change the nick of clients to nicks starting with a number e.g. in case of collision and with this restriction that is a violation of the spec.

Presently most (or all) servers don't allow nicks starting with numbers but in the future servers should be able to relax this restriction without updating the casemapping.

There's nothing stopping servers from accepting a subset of nicks allowed by this spec (they can send an invalid nick numeric for any nick they don't like) so servers can still disallow digits if they want but they cannot allow more nicks than what this spec allows. Also clients must be prepared to see nicks starting with digits.

DanielOaks · 2016-09-15T11:12:00Z

@grawity @attilamolnar Correct, PRECIS (and casemapping) does not affect display, similarly to how casemapping works currently.

DanielOaks · 2016-09-15T11:35:27Z

Made the Disallowed Characters section recommended instead of required, as suggested by @attilamolnar, threw * and ? back into the usernames section, various other minor edits of the copy.

M2Ys4U · 2016-09-15T11:52:13Z

PRECIS (RFC 7564) defines two classes, IdentifierClass and FreeformClass, the Nickname profile (RFC 7700) builds upon the latter.

To quote from 7564 (with emphasis added by me):

IdentifierClass: a sequence of letters, numbers, and some symbols that is used to identify or address a network entity such as a user account, a venue (e.g., a chatroom), an information source (e.g., a data feed), or a collection of data (e.g., a file); the intent is that this class will minimize user confusion in a wide variety of application protocols, with the result that safety has been prioritized over expressiveness for this class.

FreeformClass: a sequence of letters, numbers, symbols, spaces, and other characters that is used for free-form strings, including passwords as well as display elements such as human-friendly nicknames for devices or for participants in a chatroom; the intent is that this class will allow nearly any Unicode character, with the result that expressiveness has been prioritized over safety for this class. Note well that protocol designers, application developers, service providers, and end users might not understand or be able to enter all of the characters that can be included in the FreeformClass -- see Section 12.3 for details.

With that context out of the way, here's my question:

Should we be re-using the Nickname profile (RFC 7700) for channel names as well as nicks and usernames?

It would make more sense to me to restrict channel names to the IdentifierClass, however I can see the appeal of using a single algorithm for all IRC identifiers.

DanielOaks · 2016-09-15T11:57:34Z

That's a good point... Using multiple algorithms (one for chans, one for nicks, and/or something similar), imo is just begging for trouble but I'll certainly have a closer look into and read of that, thanks for pointing it out.

M2Ys4U · 2016-09-15T12:04:39Z

documentation/rfc7700.md

+
+With the large numbers of new characters allowed comes the risk of introducing confusion for users. The PRECIS framework (much like the earlier framework [stringprep](https://tools.ietf.org/html/rfc3454)) aims to avoid this through mapping confusable characters to a single base character, and by allowing specific known-good characters.
+
+The PRECIS framework represents the most modern standardized solution today for doing this sort of mapping and handling of internationalized names, and should mitigate most of the issues around this.


I think this is a highly misleading statement.

Reading Section 12.5 (Security Considerations - Visually Similar Characters) of RFC 7564 it says:

Because PRECIS-compliant strings can contain almost any properly encoded Unicode code point, it can be relatively easy to fake or mimic some strings in systems that use the PRECIS framework. The fact that some strings are easily confused introduces security vulnerabilities of the kind that have also plagued the World Wide Web, specifically the phenomenon known as phishing.

[...]

Because it is impossible to map visually similar characters without a great deal of context (such as knowing the font families used), the PRECIS framework does nothing to map similar-looking characters together, nor does it prohibit some characters because they look like others.

[...]

The challenges inherent in supporting the full range of Unicode code points have in the past led some to hope for a way to programmatically negotiate more restrictive ranges based on locale, script, or other relevant factors; to tag the locale associated with a particular string; etc. As a general-purpose internationalization technology, the PRECIS framework does not include such mechanisms.

Unless I'm mistaken, I believe this would be covered by the rules of the Nickname profile itself here (specifically, 3+4+5). Regardless, I'll have another read over both those documents and probably adjust the text here to make it more clear exactly what I'm referring to, thanks for pointing this out.

SadieCat · 2016-09-15T18:25:38Z

documentation/rfc7700.md

+
+Names being prepared MUST apply the following rules in the order shown:
+
+1. Preperation using the PRECIS [Nickname profile][precis].


s/Preperation/Preparation/

SadieCat · 2016-09-15T18:26:22Z

documentation/rfc7700.md

+    period: "2016"
+    email: "daniel@danieloaks.net"
+---
+This document describes a unicode-aware casemapping for IRC, based on the recommendations in [RFC 7700](https://tools.ietf.org/html/rfc7700).


Unicode is a proper noun so it should be capitalised.

@M2Ys4U

Using an IdentifierClass, as pointed out by @M2Ys4U, is much better than using a FreeformClass.

DanielOaks · 2017-01-13T16:42:56Z

Yo @M2Ys4U, now using UsernameCaseMapped (an IdentifierClass profile) for everything. In my tests... seems to work fine, and if it's better locked-down than the Nickname class then all the better.

lopcode · 2017-01-16T14:09:39Z

On IRC we discussed the use case of emoji in channel names (#🥕 for example) - irccloud and others allow this in production right now. It seems UsernameCaseMapped might disallow such channel names.

@DanielOaks is investigating the difficulty of a custom precis profile that permits such modifications to other profiles.

syzop · 2017-11-19T14:36:19Z

Hmm. I can't find any C library that has PRECIS and those profiles. But that could also be my current lack of knowledge with regards to unicode (and utf8). In any case, the availability of a library or drop-in code that various IRCd's could use for checking "is this nick permitted?" and "are these nicks the same?" would make implementing this much more doable, possibly even crucial for success. And of course, not just for IRC servers but also for services and (I suppose) clients.

Also, I read that as of October 2017 RFC8265 obsoletes RFC7613 and RFC8266 obsoletes RFC7700.

DanielOaks · 2017-11-19T17:41:50Z

Yeah, there's some trouble with this approach around confusable characters, so I've got this specification 'on hold' until I work out those issues. Once I've got those issues worked out I'll change this spec from 7613 to one of the newer RFC numbers.

To be specific, PRECIS doesn't in any way attempt to map confusable characters to a single codepoint. Well, it does, really, but only certain confusable characters, and not others. Which means you can actually get two nicknames that look exactly the same following this method. See also, section 12.5 - Visually Similar Characters of RFC 7564.

syzop · 2017-11-23T18:12:31Z

I only saw your edit just now:

To be specific, PRECIS doesn't in any way attempt to map confusable characters to a single codepoint. Well, it does, really, but only certain confusable characters, and not others. Which means you can actually get two nicknames that look exactly the same following this method.

That is disappointing. I guess I misunderstood what PRECIS does then (could be because I didn't read it :D). I must say that the Security Considerations in your draft gave me a bit of a false sense of security.. it starts with saying it has considerable security impact but then outlines the avoiding of confusing characters etc. etc... it sounded quite reassuring. So you may want to reword that or, better, see if a solution is possible (see next).

I think for something workable on IRC you would have to "solve" the problem of identical looking UTF8 nicks as well. Or give suggestions about what should be done in the IRCd. Don't you agree?
As an UTF8 noob I'm not really in the position to do this but perhaps suggesting only to allow certain scripts or only specific certain combinations...

DanielOaks · 2017-11-24T00:14:23Z

For sure, yeah. I assumed PRECIS protected against that as well (because it does map a fair few of those characters together, just not the identical-looking ones). I wrote up the spec, then someone demonstrated certain pairs of characters that look identical, but the PRECIS UsernameCaseMapped profile keeps separate, so yeah.

Don't you love Unicode?

I'm planning on something along those lines as well, similar to the PRECIS suggestions around possibly only allowing one script or similar (as much as that feels like a copout).

syzop · 2017-12-06T08:03:36Z

@DanielOaks: I tried to contact you a while back via email (27 Nov 11:53 UTC) from syzop@vunscan.org. I could put part of that here in the open:

I've added experimental UTF8 support in set::allowed-nickchars in UnrealIRCd which allows the admin to allow certain utf8 characters in nick names. In the release notes I mention that, like the original set::allowed-nickchars, it does not do any special CASEMAPPING or "similar looking character detection", and summing up the known problems with the lack of such support. I also noticed that for example anope does not seem to allow such characters which further limits the current use.
So, I'm not happy with the present state. In practice for serious networks, it's not so much usable. It's more of an experimental thing so users can play around, hoping to get that UTF8 ball rolling a bit. It's sad to see that UTF8 nick name support is still lacking in IRC in 2017.

In my opinion the goal of IRCv3, or in any case the IRC community in general, should be to add a new CASEMAPPING in some standard way/library/tables so the same casemapping (and other stuff PRECIS does) is applied the same way to irc servers and services (and clients). If every software implementation is going to choose it's own casemapping it's rather annoying and confusing. This is especially notable in the servers vs services case where f.e. account names are compared. The spec is just as important as having common code/lib/implementations.

In my email to you I also ask for some technical suggestions with regards to that.
Just checking you received it. If you did and don't think you have anything useful to reply, don't want to or don't have time, that's fine too of course. Just checking.. would be a pity if an opportunity for collaboration would be missed just by some misunderstanding / some mail ending up in Junk mail.

DanielOaks · 2017-12-06T08:11:26Z

Heyo @syzop! Sorry for not responding, emails have fallen behind a little with lots going on at work and home. I'm thinking of changing this proposal slightly to better integrate with existing servers that use something like CASEMAPPING=rfc1459, as well as clarifying more precisely in the spec the issues with this folding method and how to avoid those issues.

Totally agree with the standard way to do such a thing, that's been mostly the intention of this since it started coming up and since I threw this proposal in.

I'll respond to your email and either later this week or over the weekend throw those changes into this specification to clear things up and make it easier for existing servers to implement. Thanks for the push with this and I'm excited to see what comes of Unreal's new experimental char support :)

syzop · 2017-12-06T08:19:36Z

Great. And no problem at all! Glad to see your continued interest and look forward to working with you.

…ibility with servers, laid out Visually Similar Characters section

jwheare · 2018-01-03T12:11:33Z

s/preperation/preparation/g

Should there be a way to specify an allowed list of characters/sets as described in the visually similar section? Another ISUPPORT token?

DanielOaks · 2018-01-03T12:22:09Z

Thanks, will fix that spelling issue.

Nah, no way to do something like that. It'd end up being a multi-KB (maybe even MB) blob sent to the client on connection/registration, and we'd be inventing the format from scratch. Since the server can just refuse that channel/nick name they can just refuse it with an appropriate error message in the numeric if they want to.

jwheare · 2018-01-03T12:25:39Z

Fair enough. Just wondering if something like Unreal's allowed-nickchars labels might work instead of a huge list of chracters https://www.unrealircd.org/docs/Nick_Character_Sets

hebrew-utf8, etc aren't really standard labels but maybe something similar exists?

DanielOaks · 2018-01-03T12:32:19Z

You could try, but they wouldn't be able to be used as anything more than a rough "Hey this is what we sorta allow".

In addition to standard character sets, since the issue is around confusable characters you'd also need to have the ability to list explicit characters as well, which combinations of characters specifically are allowed and which combinations aren't allowed together.

For instance, in my (admittedly fairly lax) server I'm looking at doing an interesting approach around the confusables lists distributed by Unicode and somewhat-heuristically along with character sets determining whether specific names are allowed. Those are the sorta details you can't really codify.

Given the above, I think just plain leaving it to the server and them explicitly telling the user why that nick/channel name isn't allowed would be the best option.

DanielOaks · 2018-02-12T12:03:36Z

As a note, there's a running implementation of this on testnet.oragono.io for anyone interested in giving it a shot and seeing how it works.

It should be noted that aside from the banned characters in the spec, it doesn't implement any sort of additional protections recommended (it's planned, just a lot of work to either build up whitelists or build up some sort of blacklist system based around Unicode's confusables list). This means it's pretty similar to get similar-looking nicks using this homograph attack generator, but that's just what you get when you don't implement proper protections. I don't consider this a spec issue because the spec leaves those specific protection mechanisms up for debate. We can't legislate a single good one because there isn't any defined good one. Even browsers change how they do it regularly, and they've had to deal with it a lot longer than us.

… the Bidi rule

DanielOaks · 2019-02-12T23:00:21Z

Yo @slingamn if you could take a look over the two commits just added that'd be ace. You're the one that's looked deeper into the skeletonisation requirements so if I've got anything wrong just let me know ;)

slingamn · 2019-02-12T23:38:16Z

documentation/rfc8265.md

@@ -73,7 +73,9 @@ As noted in the [Visually Similar Characters section](https://tools.ietf.org/htm

 With the new allowed Unicode characters comes the ability to use characters that look the same. For example, `E (0x45)`, `Ε (U+0395)` (Greek Capital Letter Epsilon), and `Е (U+0415)` (Cyrillic Capital Letter IE) look the same in most fonts, but are treated as separate characters by this casemapping. More examples of these can be found in Unicode's [Confusables document](https://www.unicode.org/Public/security/latest/confusables.txt).

-To combat this, we recommend only allowing characters from a single character set or locale to be used in names, or for the allowed characters to be a specific list of known, non-confusable characters. Other recommendations are available in the [Visually Similar Characters section](https://tools.ietf.org/html/rfc8264#section-12.5) of the PRECIS framework specification. Names that have the opportunity to be confusing SHOULD be disallowed by servers.
+Unicode skeletonisation is the method we recommend to combat this. For each identifier (nick/channel name) on the server, a 'skeleton' is generated by taking the **casefolded** name, and then applying to it the transformations described in the [Unicode Security Mechanisms document](http://unicode.org/reports/tr39/#Confusable_Detection). These skeletons, if used, MUST ONLY be used for comparison, and not as any user-visible identifier (as they intentionally contain complicated mixes of scripts and characters). When users change nicknames or create new channels, the casefolded names should be compared and the skeletons should also be compared to ensure that both are globally-unique (with any non-unique names rejected outright). This seems to be the most reliable method as of right now, but does require storing the skeletons of all in-use names for comparison purposes.


This isn't what we implemented in oragono --- we skeletonize the unfolded, original identifier (the one that is displayed to the users), and only then apply a round of width and case normalization. The rationale is that an initial round of casefolding may lose information about visual confusability. Hypothetically, you could have a non-Latin character with both uppercase and lowercase forms, such that its uppercase form is visually confusable with a Latin character but its lowercase form is visually distinct. Casefolding first would allow an impersonation attack using the uppercase character.

To be honest, I think it would help to see how this plays out in the wild --- try to get real-world user stories from people using non-Latin scripts, also see if we can get some Unicode experts to play with the implementation and try to break it.

DanielOaks · 2021-02-24T17:30:17Z

I think the only thing to fix with this spec is that we described our skeletonisation a bit incorrectly, but otherwise this should describe our implementation pretty well.

As far as a general i18n name solution, I like the prospect of display names more than this spec because of the simpler implementation (once you've got Metadata at least) and the flexibility. It takes a lot to keep Unicode identifiers unique, evident from our skeletonisation description above.

We're gonna keep this implementation in Oragono, but I might publish this as a vendor thing on our site instead of keeping it as a PR here. Particularly as PRECIS libraries that IRC servers written in C (and the like) don't really seem to be available.

Add document for rfc7700 casemapping

49c702b

attilamolnar reviewed Sep 15, 2016

View reviewed changes

grawity reviewed Sep 15, 2016

View reviewed changes

rfc7700: Improve links, fix incorrect description

8f48b1c

attilamolnar reviewed Sep 15, 2016

View reviewed changes

rfc7700: Fix incorrect description, only disallow : as the first char…

357cde6

…acter of names

attilamolnar reviewed Sep 15, 2016

View reviewed changes

jwheare requested changes Sep 15, 2016

View reviewed changes

attilamolnar reviewed Sep 15, 2016

View reviewed changes

rfc7700: Fix mistake, remove legacy restrictions

0a605cd

rfc7700: Make Disallowed Characters section recommended, various edits

36cfa0c

rfc7700: Add * and ? to hostnames as well

f1a3082

jwheare approved these changes Sep 15, 2016

View reviewed changes

M2Ys4U reviewed Sep 15, 2016

View reviewed changes

SadieCat reviewed Sep 15, 2016

View reviewed changes

rfc7700: Incorportate fixes/suggestions from @SaberUK

382c522

DanielOaks mentioned this pull request Sep 17, 2016

Use rfc7700 for nick/channel names ergochat/ergo#9

Closed

DanielOaks mentioned this pull request Oct 19, 2016

Allow Unicode in nicknames #259

Open

Mikaela mentioned this pull request Dec 5, 2016

Channel names do not support unicode characters freenode/ircd-seven#20

Open

jwheare added the protocol label Jan 7, 2017

jwheare modified the milestone: Roadmap Jan 7, 2017

DanielOaks changed the title ~~Add document for rfc7700 casemapping~~ Add document for Unicode casemapping Jan 13, 2017

unicode_casemapping: 7700 -> 7613. Now using UsernameCaseMapped.

0d6435d

Using an IdentifierClass, as pointed out by @M2Ys4U, is much better than using a FreeformClass.

rfc7613: Update copyright years

fb9dc99

jwheare added the Unresolved decisions label Aug 9, 2017

DanielOaks added 2 commits December 26, 2017 07:49

unicode_casemapping: 7613 -> 8265. UTF8MAPPING to allow better compat…

807e084

…ibility with servers, laid out Visually Similar Characters section

rfc8265: Fix text

fa23d59

DanielOaks mentioned this pull request Nov 23, 2018

Support UTF-8 custom prefix characters inspircd/inspircd#1533

Closed

DanielOaks added 2 commits February 13, 2019 08:45

Add additional advice for casefolding channel names, to avoid hitting…

634fb6d

… the Bidi rule

Add Unicode skeletonisation recommendation to avoid confusables

840aa78

slingamn reviewed Feb 12, 2019

View reviewed changes

DanielOaks mentioned this pull request Jul 1, 2019

IRCX / MSN Chat - Pre 2000 was more advanced than you might think... ircv3/ircv3-ideas#48

Open

slingamn mentioned this pull request Aug 4, 2020

Channel renaming extension #420

Merged

jwheare closed this Feb 25, 2021


		These steps MUST happen in the order shown, or else the restricted characters check may miss characters that should be legitimately restricted.

		If a name does contain a restricted character (whether disallowed by the [Nickname profile](https://tools.ietf.org/html/rfc7700#section-2.2) or this document), it MUST be rejected by the server and MUST NOT be propogated to other clients. This is done through the appropriate numeric for the command which tried to set or use the invalid name such as `ERR_ERRONEUSNICKNAME`, `ERR_NOSUCHCHANNEL`, or whichever numeric is most appropriate.


		With the large numbers of new characters allowed comes the risk of introducing confusion for users. The PRECIS framework (much like the earlier framework [stringprep](https://tools.ietf.org/html/rfc3454)) aims to avoid this through mapping confusable characters to a single base character, and by allowing specific known-good characters.

		The PRECIS framework represents the most modern standardized solution today for doing this sort of mapping and handling of internationalized names, and should mitigate most of the issues around this.


		Names being prepared MUST apply the following rules in the order shown:

		1. Preperation using the PRECIS [Nickname profile][precis].

Add document for Unicode casemapping #272

Add document for Unicode casemapping #272

Conversation

DanielOaks commented Sep 15, 2016

Encoding names so non-rfc7700 servers can accept them

Pros

Cons

Encoding names so that non-unicode-aware clients can accept them

Pros

Cons

Choose a reason for hiding this comment

grawity Sep 15, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grawity Sep 15, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

attilamolnar Sep 15, 2016 • edited Loading

Choose a reason for hiding this comment

jwheare Sep 15, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

attilamolnar commented Sep 15, 2016

jwheare Sep 15, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grawity commented Sep 15, 2016 • edited Loading

attilamolnar commented Sep 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DanielOaks commented Sep 15, 2016 • edited Loading

DanielOaks commented Sep 15, 2016

M2Ys4U commented Sep 15, 2016 • edited Loading

DanielOaks commented Sep 15, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DanielOaks commented Jan 13, 2017

lopcode commented Jan 16, 2017 • edited Loading

syzop commented Nov 19, 2017

DanielOaks commented Nov 19, 2017 • edited Loading

syzop commented Nov 23, 2017 • edited Loading

DanielOaks commented Nov 24, 2017

syzop commented Dec 6, 2017 • edited Loading

DanielOaks commented Dec 6, 2017

syzop commented Dec 6, 2017

jwheare commented Jan 3, 2018

DanielOaks commented Jan 3, 2018 • edited Loading

jwheare commented Jan 3, 2018

DanielOaks commented Jan 3, 2018 • edited Loading

DanielOaks commented Feb 12, 2018

DanielOaks commented Feb 12, 2019

Choose a reason for hiding this comment

DanielOaks commented Feb 24, 2021

grawity Sep 15, 2016 •

edited

Loading

grawity Sep 15, 2016 •

edited

Loading

attilamolnar Sep 15, 2016 •

edited

Loading

jwheare Sep 15, 2016 •

edited

Loading

jwheare Sep 15, 2016 •

edited

Loading

grawity commented Sep 15, 2016 •

edited

Loading

DanielOaks commented Sep 15, 2016 •

edited

Loading

M2Ys4U commented Sep 15, 2016 •

edited

Loading

DanielOaks commented Sep 15, 2016 •

edited

Loading

lopcode commented Jan 16, 2017 •

edited

Loading

DanielOaks commented Nov 19, 2017 •

edited

Loading

syzop commented Nov 23, 2017 •

edited

Loading

syzop commented Dec 6, 2017 •

edited

Loading

DanielOaks commented Jan 3, 2018 •

edited

Loading

DanielOaks commented Jan 3, 2018 •

edited

Loading