respect_word_boundaries: true breaks when first character of the search term is non-ASCII

I did:

* [x] Search for if my issue has already been submitted
* [x] Make sure I'm reporting something precise that needs to be fixed
* [x] Give my issue a descriptive and concise title
* [x] Create a *minimal* working example on JsFiddle or Codepen
	(or gave a link to a demo on the Selectize docs)
* [x] Indicate *precise* steps to reproduce in *numbers* and the result,
	  like below

Non-ASCII/Unicode character at the beginning of an option string cannot be looked up using search.

Steps to reproduce:
1. Use code from https://jsfiddle.net/w9gecnyo/4/
2. Search for one of the two Unicode characters: "č" or "Č"

TL;DR Define two options, like "Čápkova" and "Ečerova", and then search for "č" or "Č" with respect_word_boundaries enabled (default).

Expected result:
Only option "Čápkova" should be listed (there is a match on the first letter, i.e. word boundary).

Actual result:
Only option "Ečerova" is listed - presumably because non-ASCII character does not act as a word boundary?!

As far as I can tell, this is caused by `\b` added in Sifter for `respect_word_boundaries: true`. This looks like problem with `\b` definition, so Unicode-aware word boundary detection needs some other trick.

This attempt at regex101.com seems to confirm that:

![screenshot](https://user-images.githubusercontent.com/2099568/202907406-a2d9c880-6183-45fb-9da2-56dfb3d914bd.png)

SO seems to somewhat agree with this diagnosis:
https://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

respect_word_boundaries: true breaks when first character of the search term is non-ASCII #1916

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

respect_word_boundaries: true breaks when first character of the search term is non-ASCII #1916

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions