-
Notifications
You must be signed in to change notification settings - Fork 107
Description
RFC4791 requires the server to support searching according to two collations - i;octet for binary match and i;ascii-casemap allowing case-insensitive search, with the latter being the default. In caldav v2.2.2 there is test code covering both "case sensitive" and "case insensitive" searches. The problem with i;ascii-casemap is that it only works for ascii characters - causing mismatches between naïve and NAÏVE, cliché and CLICHÉ, smörgåsbord and SMØRGÅSBORD, not to forget millions of words in non-English languages, complete non-latin scripts, etc.
RFC4790 specifies a i;unicode-casemap collation, which may or may not be supported by the server. RFC4791 section 7.5.1 says that it's possible to ask the server what collations it support.
To resolve this issue ...
- Case-insensitive searches should work for non-ascii characters on all servers supporting it.
- Library should detect non-ascii characters and do workarounds for servers not supporting case insensitivity on non-ascii characters.
Locale support would be nice (i.e. "istanbul" should match with İstanbul in Turkish locale), but not required (this may be a very deep rabbit hole - one would like istanbul to match both İstanbul and Istanbul, at the other hand there may be too many false negatives if the matching is too liberal).
A good test-case may include English loan-words like crème brûlée and naïve, typical Scandinavian words like Smörgåsbord, Blåbærsyltetøy, some French and Turkish words, as well as Ukrainian text.
The i;unicode-casemap may not be sufficient to handle all languages, ref the Istanbul example above.
There is an example file in the example directory that may need brush-up as well.