Improved collation support for non-ascii case-insensitive text-match

RFC4791 requires the server to support searching according to two collations - `i;octet` for binary match and `i;ascii-casemap` allowing case-insensitive search, with the latter being the default.  In caldav v2.2.2 there is test code covering both "case sensitive" and "case insensitive" searches.  The problem with i;ascii-casemap is that it only works for ascii characters - causing mismatches between naïve and NAÏVE, cliché and CLICHÉ, smörgåsbord and SMØRGÅSBORD, not to forget millions of words in non-English languages, complete non-latin scripts, etc.

RFC4790 specifies a i;unicode-casemap collation, which may or may not be supported by the server.  RFC4791 section 7.5.1 says that it's possible to ask the server what collations it support.

To resolve this issue ...

* [ ] Case-insensitive searches should work for non-ascii characters on all servers supporting it.  
* [ ] Library should detect non-ascii characters and do workarounds for servers not supporting case insensitivity on non-ascii characters.

Locale support would be nice (i.e. "istanbul" should match with İstanbul in Turkish locale), but not required (this may be a very deep rabbit hole - one would like istanbul to match both İstanbul and Istanbul, at the other hand there may be too many false negatives if the matching is too liberal).

A good test-case may include English loan-words like crème brûlée and naïve, typical Scandinavian words like Smörgåsbord, Blåbærsyltetøy, some French and Turkish words, as well as Ukrainian text.

The `i;unicode-casemap` may not be sufficient to handle all languages, ref the Istanbul example above.

There is an example file in the example directory that may need brush-up as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved collation support for non-ascii case-insensitive text-match #567

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improved collation support for non-ascii case-insensitive text-match #567

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions