Description
I did:
- Search for if my issue has already been submitted
- Make sure I'm reporting something precise that needs to be fixed
- Give my issue a descriptive and concise title
- Create a minimal working example on JsFiddle or Codepen
(or gave a link to a demo on the Selectize docs) - Indicate precise steps to reproduce in numbers and the result,
like below
Non-ASCII/Unicode character at the beginning of an option string cannot be looked up using search.
Steps to reproduce:
- Use code from https://jsfiddle.net/w9gecnyo/4/
- Search for one of the two Unicode characters: "č" or "Č"
TL;DR Define two options, like "Čápkova" and "Ečerova", and then search for "č" or "Č" with respect_word_boundaries enabled (default).
Expected result:
Only option "Čápkova" should be listed (there is a match on the first letter, i.e. word boundary).
Actual result:
Only option "Ečerova" is listed - presumably because non-ASCII character does not act as a word boundary?!
As far as I can tell, this is caused by \b
added in Sifter for respect_word_boundaries: true
. This looks like problem with \b
definition, so Unicode-aware word boundary detection needs some other trick.
This attempt at regex101.com seems to confirm that:
SO seems to somewhat agree with this diagnosis:
https://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters