Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add document for Unicode casemapping #272

Closed
wants to merge 13 commits into from

Conversation

DanielOaks
Copy link
Member

Unicode names have been wanted for a while, and used in experimental implementations as well in certain bouncers integrating with other messaging systems.

This document outlines a method based on RFC 7700 which should represent a reasonable, modern solution for those projects that wish to allow unicode characters and casemap them appropriately.

There's previous discussion around this in #259.

This casemapping does not specify any sort of backwards-compatibility measures. Being compatible with clients and servers that cannot correctly handle unicode has been brought up many times during discussions about unicode casemappings. Below outlines some of the most reasonable suggestions, and why I haven't included them in this specification:

Encoding names so non-rfc7700 servers can accept them

This suggestion revolves around the client encoding nicknames and channel names into currently IRC-friendly characters before it sends them to the server (allowing them to be used on every server out there today). When receiving these encoded names, other unicode-aware clients will decode them to their proper unicode counterpart before displaying them.

Pros

  • Unicode nicknames and channel names can be used on servers that don't natively support unicode.
  • Non-unicode-aware clients can connect to servers that are unicode-aware.

Cons

  • Non-unicode-aware servers will allow nicknames nicknames that look like duplicates, due to the encoding required and the server not being able to enforce the name preperation described above.
  • Possible duplication of names by encoding names that contain only irc-friendly characters (or otherwise, strict client-side checking that is likely to be misinterpreted or misimplemented).

Because of the security implications this would bring up, I think this is an extremely bad idea.

Encoding names so that non-unicode-aware clients can accept them

This suggestion revolves around the server encoding nicknames and channel names into currently IRC-friendly characters before it sends them to the client (allowing them to be accepted by every client out there). When receiving these encoded names, unicode-aware clients will decode them into their proper unicode counterpart before displaying them.

Pros

  • We can be assured that any client, including ones that can't do unicode, will be able to accept the names.

Cons

  • Due to the encoding, these encoded nicknames are not going to be easily readable by non-unicode-aware clients, and are going to appear as a blob of unreadable text.
  • Even if we only encode names that contain special characters, that complicates message sending in ways that's likely going to irritate server authors into not implementing this.
  • The decoding/encoding required by this (particularly if only certain names are encoded) complicates client programming in ways that is likely to be misimplemented.

I don't think this is required because getting this casemapping widely implemented will take time. By the time this casemapping gets into large enough use to warrant worrying about legacy clients, I think a large majority of the clients currently in use will support unicode names without issues. As well, a number of clients already successfully accept unicode names.

Because of the complexity this process adds and how much I see it as a non-issue at this point this is implemented, I don't think this should happen and believe it's more effort than it's worth I think this measure would just cause more problems than it would solve.

* `(',', 0x2C)` - Used as a separator.
* `('*', 0x2A)` - Used in mask matching.
* `('?', 0x3F)` - Used in mask matching.
* `('!', 0x21)` - Separates username from hostname.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix: separates nickname from username

Nicknames cannot contain the following characters:

* `(' ', 0x20)` - Separates parameters.
* `(':', 0x3A)` - Separates trailing parameter.
Copy link
Contributor

@grawity grawity Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This & identical entries below seem unnecessary; : only has special meaning as the first character of a parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but since it's already disallowed and I could see it causing possible confusion with libraries that split parameters strangely, figured it was better to disallow it. If we figure it's not required I can definitely remove it though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well forbid : in privmsgs, topics, etc. A library blindly splitting on : is not "strange", it's outright buggy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's fair. In that case, I can just note that the first letter of one can't be :? (since if i.e. a nickname started with : then you wouldn't be able to use it in normal messages)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be fine under the "first character" list. (For channels it's already implied by CHANTYPE.)


These steps MUST happen in the order shown, or else the restricted characters check may miss characters that should be legitimately restricted.

If a name does contain a restricted character (whether disallowed by the [Nickname profile](https://tools.ietf.org/html/rfc7700#section-2.2) or this document), it MUST be rejected by the server and MUST NOT be propogated to other clients. This is done through the appropriate numeric for the command which tried to set or use the invalid name such as `ERR_ERRONEUSNICKNAME`, `ERR_NOSUCHCHANNEL`, or whichever numeric is most appropriate.
Copy link
Contributor

@grawity grawity Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point better use the named link syntax:

The `rfc7700` casemapping uses the PRECIS [Nickname profile][precis] as defined in [Section 2 of RFC 7700][precis].

[precis]: https://tools.ietf.org/html/rfc7700#section-2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I changed the other links above to use the named link syntax since they were all referring to the same URL. The link here and the link to rfc7700 up the top differ from the others since they're linking to different sections (and the url's only replicated once throughout the doc).

* `('*', 0x2A)` - Used in mask matching.
* `('?', 0x3F)` - Used in mask matching.
* `('.', 0x2E)` - Denotes a server name.
* `('!', 0x21)` - Separates username from hostname.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previously

Hostnames cannot contain the following charactes:

* `(' ', 0x20)` - Separates parameters.
* `(':', 0x3A)` - Separates trailing parameter.
Copy link
Contributor

@attilamolnar attilamolnar Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPv6 IPs need : in the hostname (spotted by @jobe1986)

Copy link
Member

@jwheare jwheare Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They even technically can have it as the first character. Which is a bit problematic for 352 RPL_WHOREPLY and 311 RPL_WHOISUSER.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Servers add a 0 prefix to IPv6 IPs beginning with : so that's not a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, meant to remove those with another change. This has been removed.

@attilamolnar
Copy link
Contributor

@DanielOaks Could you add some examples, particularly ones that illustrate how comparisons work?

* `(' ', 0x20)` - Separates parameters.
* `(',', 0x2C)` - Used as a separator.
* `('*', 0x2A)` - Used in mask matching.
* `('?', 0x3F)` - Used in mask matching.
Copy link
Member

@jwheare jwheare Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is mask matching in channel names a thing? * and ? are valid channel characters at the moment, this seems overly restrictive. (spotted by @jobe1986)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, removed those

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We (InspIRCd) use glob matching on channel names in various places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SaberUK Example? Do you also forbid those characters in channel names or is there just no way to specify them without accidentally over-globbing?

Actually, I just tested and was able to create a channel on Insp with both * and ?. I think the recommendation should probably not go against existing valid characters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwheare We don't presently forbid them although they are used in various places like e.g.

https://github.com/inspircd/inspircd/blob/master/docs/conf/modules.conf.example#L714

This does unfortunately result in some problems like what you mentioned though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grawity
Copy link
Contributor

grawity commented Sep 15, 2016

For clarification: Does PRECIS affect only comparisons or display as well? If it affects display, does the PRECIS case-folding rule mean that it's impossible to use mixed-case nicknames (since they get mapped to lowercase)?

@attilamolnar
Copy link
Contributor

@grawity As I understand it only affects comparisons, if adopting it meant losing upper case characters in nicks then it would be a step backwards.

* `('6', 0x36)` - Disallowed.
* `('7', 0x37)` - Disallowed.
* `('8', 0x38)` - Disallowed.
* `('9', 0x39)` - Disallowed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not allowing numbers as the first char of a nick shouldn't be in the spec for these reasons:

  • Servers already change the nick of clients to nicks starting with a number e.g. in case of collision and with this restriction that is a violation of the spec.
  • Presently most (or all) servers don't allow nicks starting with numbers but in the future servers should be able to relax this restriction without updating the casemapping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing stopping servers from accepting a subset of nicks allowed by this spec (they can send an invalid nick numeric for any nick they don't like) so servers can still disallow digits if they want but they cannot allow more nicks than what this spec allows. Also clients must be prepared to see nicks starting with digits.

@DanielOaks
Copy link
Member Author

DanielOaks commented Sep 15, 2016

@grawity @attilamolnar Correct, PRECIS (and casemapping) does not affect display, similarly to how casemapping works currently.

@DanielOaks
Copy link
Member Author

Made the Disallowed Characters section recommended instead of required, as suggested by @attilamolnar, threw * and ? back into the usernames section, various other minor edits of the copy.

@M2Ys4U
Copy link
Contributor

M2Ys4U commented Sep 15, 2016

PRECIS (RFC 7564) defines two classes, IdentifierClass and FreeformClass, the Nickname profile (RFC 7700) builds upon the latter.

To quote from 7564 (with emphasis added by me):

IdentifierClass: a sequence of letters, numbers, and some symbols that is used to identify or address a network entity such as a user account, a venue (e.g., a chatroom), an information source (e.g., a data feed), or a collection of data (e.g., a file); the intent is that this class will minimize user confusion in a wide variety of application protocols, with the result that safety has been prioritized over expressiveness for this class.

FreeformClass: a sequence of letters, numbers, symbols, spaces, and other characters that is used for free-form strings, including passwords as well as display elements such as human-friendly nicknames for devices or for participants in a chatroom; the intent is that this class will allow nearly any Unicode character, with the result that expressiveness has been prioritized over safety for this class. Note well that protocol designers, application developers, service providers, and end users might not understand or be able to enter all of the characters that can be included in the FreeformClass -- see Section 12.3 for details.

With that context out of the way, here's my question:

Should we be re-using the Nickname profile (RFC 7700) for channel names as well as nicks and usernames?

It would make more sense to me to restrict channel names to the IdentifierClass, however I can see the appeal of using a single algorithm for all IRC identifiers.

@DanielOaks
Copy link
Member Author

DanielOaks commented Sep 15, 2016

That's a good point... Using multiple algorithms (one for chans, one for nicks, and/or something similar), imo is just begging for trouble but I'll certainly have a closer look into and read of that, thanks for pointing it out.


With the large numbers of new characters allowed comes the risk of introducing confusion for users. The PRECIS framework (much like the earlier framework [stringprep](https://tools.ietf.org/html/rfc3454)) aims to avoid this through mapping confusable characters to a single base character, and by allowing specific known-good characters.

The PRECIS framework represents the most modern standardized solution today for doing this sort of mapping and handling of internationalized names, and should mitigate most of the issues around this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a highly misleading statement.

Reading Section 12.5 (Security Considerations - Visually Similar Characters) of RFC 7564 it says:

Because PRECIS-compliant strings can contain almost any properly encoded Unicode code point, it can be relatively easy to fake or mimic some strings in systems that use the PRECIS framework. The fact that some strings are easily confused introduces security vulnerabilities of the kind that have also plagued the World Wide Web, specifically the phenomenon known as phishing.

[...]

Because it is impossible to map visually similar characters without a great deal of context (such as knowing the font families used), the PRECIS framework does nothing to map similar-looking characters together, nor does it prohibit some characters because they look like others.

[...]

The challenges inherent in supporting the full range of Unicode code points have in the past led some to hope for a way to programmatically negotiate more restrictive ranges based on locale, script, or other relevant factors; to tag the locale associated with a particular string; etc. As a general-purpose internationalization technology, the PRECIS framework does not include such mechanisms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm mistaken, I believe this would be covered by the rules of the Nickname profile itself here (specifically, 3+4+5). Regardless, I'll have another read over both those documents and probably adjust the text here to make it more clear exactly what I'm referring to, thanks for pointing this out.


Names being prepared MUST apply the following rules in the order shown:

1. Preperation using the PRECIS [Nickname profile][precis].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Preperation/Preparation/

period: "2016"
email: "daniel@danieloaks.net"
---
This document describes a unicode-aware casemapping for IRC, based on the recommendations in [RFC 7700](https://tools.ietf.org/html/rfc7700).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unicode is a proper noun so it should be capitalised.

@DanielOaks DanielOaks changed the title Add document for rfc7700 casemapping Add document for Unicode casemapping Jan 13, 2017
Using an IdentifierClass, as pointed out by @M2Ys4U, is much better than using a FreeformClass.
@DanielOaks
Copy link
Member Author

Yo @M2Ys4U, now using UsernameCaseMapped (an IdentifierClass profile) for everything. In my tests... seems to work fine, and if it's better locked-down than the Nickname class then all the better.

@lopcode
Copy link

lopcode commented Jan 16, 2017

On IRC we discussed the use case of emoji in channel names (#🥕 for example) - irccloud and others allow this in production right now. It seems UsernameCaseMapped might disallow such channel names.

@DanielOaks is investigating the difficulty of a custom precis profile that permits such modifications to other profiles.

@syzop
Copy link

syzop commented Nov 19, 2017

Hmm. I can't find any C library that has PRECIS and those profiles. But that could also be my current lack of knowledge with regards to unicode (and utf8). In any case, the availability of a library or drop-in code that various IRCd's could use for checking "is this nick permitted?" and "are these nicks the same?" would make implementing this much more doable, possibly even crucial for success. And of course, not just for IRC servers but also for services and (I suppose) clients.

Also, I read that as of October 2017 RFC8265 obsoletes RFC7613 and RFC8266 obsoletes RFC7700.

@DanielOaks
Copy link
Member Author

DanielOaks commented Nov 19, 2017

Yeah, there's some trouble with this approach around confusable characters, so I've got this specification 'on hold' until I work out those issues. Once I've got those issues worked out I'll change this spec from 7613 to one of the newer RFC numbers.

To be specific, PRECIS doesn't in any way attempt to map confusable characters to a single codepoint. Well, it does, really, but only certain confusable characters, and not others. Which means you can actually get two nicknames that look exactly the same following this method. See also, section 12.5 - Visually Similar Characters of RFC 7564.

@syzop
Copy link

syzop commented Nov 23, 2017

I only saw your edit just now:

To be specific, PRECIS doesn't in any way attempt to map confusable characters to a single codepoint. Well, it does, really, but only certain confusable characters, and not others. Which means you can actually get two nicknames that look exactly the same following this method.

That is disappointing. I guess I misunderstood what PRECIS does then (could be because I didn't read it :D). I must say that the Security Considerations in your draft gave me a bit of a false sense of security.. it starts with saying it has considerable security impact but then outlines the avoiding of confusing characters etc. etc... it sounded quite reassuring. So you may want to reword that or, better, see if a solution is possible (see next).

I think for something workable on IRC you would have to "solve" the problem of identical looking UTF8 nicks as well. Or give suggestions about what should be done in the IRCd. Don't you agree?
As an UTF8 noob I'm not really in the position to do this but perhaps suggesting only to allow certain scripts or only specific certain combinations...

@DanielOaks
Copy link
Member Author

For sure, yeah. I assumed PRECIS protected against that as well (because it does map a fair few of those characters together, just not the identical-looking ones). I wrote up the spec, then someone demonstrated certain pairs of characters that look identical, but the PRECIS UsernameCaseMapped profile keeps separate, so yeah.

Don't you love Unicode?

I'm planning on something along those lines as well, similar to the PRECIS suggestions around possibly only allowing one script or similar (as much as that feels like a copout).

@syzop
Copy link

syzop commented Dec 6, 2017

@DanielOaks: I tried to contact you a while back via email (27 Nov 11:53 UTC) from syzop@vunscan.org. I could put part of that here in the open:

I've added experimental UTF8 support in set::allowed-nickchars in UnrealIRCd which allows the admin to allow certain utf8 characters in nick names. In the release notes I mention that, like the original set::allowed-nickchars, it does not do any special CASEMAPPING or "similar looking character detection", and summing up the known problems with the lack of such support. I also noticed that for example anope does not seem to allow such characters which further limits the current use.
So, I'm not happy with the present state. In practice for serious networks, it's not so much usable. It's more of an experimental thing so users can play around, hoping to get that UTF8 ball rolling a bit. It's sad to see that UTF8 nick name support is still lacking in IRC in 2017.

In my opinion the goal of IRCv3, or in any case the IRC community in general, should be to add a new CASEMAPPING in some standard way/library/tables so the same casemapping (and other stuff PRECIS does) is applied the same way to irc servers and services (and clients). If every software implementation is going to choose it's own casemapping it's rather annoying and confusing. This is especially notable in the servers vs services case where f.e. account names are compared. The spec is just as important as having common code/lib/implementations.

In my email to you I also ask for some technical suggestions with regards to that.
Just checking you received it. If you did and don't think you have anything useful to reply, don't want to or don't have time, that's fine too of course. Just checking.. would be a pity if an opportunity for collaboration would be missed just by some misunderstanding / some mail ending up in Junk mail.

@DanielOaks
Copy link
Member Author

Heyo @syzop! Sorry for not responding, emails have fallen behind a little with lots going on at work and home. I'm thinking of changing this proposal slightly to better integrate with existing servers that use something like CASEMAPPING=rfc1459, as well as clarifying more precisely in the spec the issues with this folding method and how to avoid those issues.

Totally agree with the standard way to do such a thing, that's been mostly the intention of this since it started coming up and since I threw this proposal in.

I'll respond to your email and either later this week or over the weekend throw those changes into this specification to clear things up and make it easier for existing servers to implement. Thanks for the push with this and I'm excited to see what comes of Unreal's new experimental char support :)

@syzop
Copy link

syzop commented Dec 6, 2017

Great. And no problem at all! Glad to see your continued interest and look forward to working with you.

@jwheare
Copy link
Member

jwheare commented Jan 3, 2018

s/preperation/preparation/g

Should there be a way to specify an allowed list of characters/sets as described in the visually similar section? Another ISUPPORT token?

@DanielOaks
Copy link
Member Author

DanielOaks commented Jan 3, 2018

Thanks, will fix that spelling issue.

Nah, no way to do something like that. It'd end up being a multi-KB (maybe even MB) blob sent to the client on connection/registration, and we'd be inventing the format from scratch. Since the server can just refuse that channel/nick name they can just refuse it with an appropriate error message in the numeric if they want to.

@jwheare
Copy link
Member

jwheare commented Jan 3, 2018

Fair enough. Just wondering if something like Unreal's allowed-nickchars labels might work instead of a huge list of chracters https://www.unrealircd.org/docs/Nick_Character_Sets

hebrew-utf8, etc aren't really standard labels but maybe something similar exists?

@DanielOaks
Copy link
Member Author

DanielOaks commented Jan 3, 2018

You could try, but they wouldn't be able to be used as anything more than a rough "Hey this is what we sorta allow".

In addition to standard character sets, since the issue is around confusable characters you'd also need to have the ability to list explicit characters as well, which combinations of characters specifically are allowed and which combinations aren't allowed together.

For instance, in my (admittedly fairly lax) server I'm looking at doing an interesting approach around the confusables lists distributed by Unicode and somewhat-heuristically along with character sets determining whether specific names are allowed. Those are the sorta details you can't really codify.

Given the above, I think just plain leaving it to the server and them explicitly telling the user why that nick/channel name isn't allowed would be the best option.

@DanielOaks
Copy link
Member Author

As a note, there's a running implementation of this on testnet.oragono.io for anyone interested in giving it a shot and seeing how it works.

It should be noted that aside from the banned characters in the spec, it doesn't implement any sort of additional protections recommended (it's planned, just a lot of work to either build up whitelists or build up some sort of blacklist system based around Unicode's confusables list). This means it's pretty similar to get similar-looking nicks using this homograph attack generator, but that's just what you get when you don't implement proper protections. I don't consider this a spec issue because the spec leaves those specific protection mechanisms up for debate. We can't legislate a single good one because there isn't any defined good one. Even browsers change how they do it regularly, and they've had to deal with it a lot longer than us.

@DanielOaks
Copy link
Member Author

Yo @slingamn if you could take a look over the two commits just added that'd be ace. You're the one that's looked deeper into the skeletonisation requirements so if I've got anything wrong just let me know ;)

@@ -73,7 +73,9 @@ As noted in the [Visually Similar Characters section](https://tools.ietf.org/htm

With the new allowed Unicode characters comes the ability to use characters that look the same. For example, `E (0x45)`, `Ε (U+0395)` (Greek Capital Letter Epsilon), and `Е (U+0415)` (Cyrillic Capital Letter IE) look the same in most fonts, but are treated as separate characters by this casemapping. More examples of these can be found in Unicode's [Confusables document](https://www.unicode.org/Public/security/latest/confusables.txt).

To combat this, we recommend only allowing characters from a single character set or locale to be used in names, or for the allowed characters to be a specific list of known, non-confusable characters. Other recommendations are available in the [Visually Similar Characters section](https://tools.ietf.org/html/rfc8264#section-12.5) of the PRECIS framework specification. Names that have the opportunity to be confusing SHOULD be disallowed by servers.
Unicode skeletonisation is the method we recommend to combat this. For each identifier (nick/channel name) on the server, a 'skeleton' is generated by taking the **casefolded** name, and then applying to it the transformations described in the [Unicode Security Mechanisms document](http://unicode.org/reports/tr39/#Confusable_Detection). These skeletons, if used, MUST ONLY be used for comparison, and not as any user-visible identifier (as they intentionally contain complicated mixes of scripts and characters). When users change nicknames or create new channels, the casefolded names should be compared and the skeletons should also be compared to ensure that both are globally-unique (with any non-unique names rejected outright). This seems to be the most reliable method as of right now, but does require storing the skeletons of all in-use names for comparison purposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't what we implemented in oragono --- we skeletonize the unfolded, original identifier (the one that is displayed to the users), and only then apply a round of width and case normalization. The rationale is that an initial round of casefolding may lose information about visual confusability. Hypothetically, you could have a non-Latin character with both uppercase and lowercase forms, such that its uppercase form is visually confusable with a Latin character but its lowercase form is visually distinct. Casefolding first would allow an impersonation attack using the uppercase character.

To be honest, I think it would help to see how this plays out in the wild --- try to get real-world user stories from people using non-Latin scripts, also see if we can get some Unicode experts to play with the implementation and try to break it.

@DanielOaks
Copy link
Member Author

I think the only thing to fix with this spec is that we described our skeletonisation a bit incorrectly, but otherwise this should describe our implementation pretty well.

As far as a general i18n name solution, I like the prospect of display names more than this spec because of the simpler implementation (once you've got Metadata at least) and the flexibility. It takes a lot to keep Unicode identifiers unique, evident from our skeletonisation description above.

We're gonna keep this implementation in Oragono, but I might publish this as a vendor thing on our site instead of keeping it as a PR here. Particularly as PRECIS libraries that IRC servers written in C (and the like) don't really seem to be available.

@jwheare jwheare closed this Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants