Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

respect_word_boundaries: true breaks when first character of the search term is non-ASCII #1916

Open
5 tasks done
spacekpe opened this issue Nov 20, 2022 · 12 comments
Open
5 tasks done

Comments

@spacekpe
Copy link

I did:

  • Search for if my issue has already been submitted
  • Make sure I'm reporting something precise that needs to be fixed
  • Give my issue a descriptive and concise title
  • Create a minimal working example on JsFiddle or Codepen
    (or gave a link to a demo on the Selectize docs)
  • Indicate precise steps to reproduce in numbers and the result,
    like below

Non-ASCII/Unicode character at the beginning of an option string cannot be looked up using search.

Steps to reproduce:

  1. Use code from https://jsfiddle.net/w9gecnyo/4/
  2. Search for one of the two Unicode characters: "č" or "Č"

TL;DR Define two options, like "Čápkova" and "Ečerova", and then search for "č" or "Č" with respect_word_boundaries enabled (default).

Expected result:
Only option "Čápkova" should be listed (there is a match on the first letter, i.e. word boundary).

Actual result:
Only option "Ečerova" is listed - presumably because non-ASCII character does not act as a word boundary?!

As far as I can tell, this is caused by \b added in Sifter for respect_word_boundaries: true. This looks like problem with \b definition, so Unicode-aware word boundary detection needs some other trick.

This attempt at regex101.com seems to confirm that:

screenshot

SO seems to somewhat agree with this diagnosis:
https://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters

@risadams
Copy link
Contributor

For now, I'm reverting the default behavior of respect_word_boundries to false. This will work the same as it had prior to introduce the new feature. I do think that we need much better unicode support in general, which will be a bigger fix.

Good catch!

risadams added a commit that referenced this issue Nov 22, 2022
This address a bad default causing issues with non-ascii chars in sifter, and default to headless chrome instead of the unmaintained phantomJS for unit testing
@heyyo-droid
Copy link

Hi guys,
I think I'm facing the same issue. But in my case, searching for Hebrew letter doesn't return anything.

  • for example searching: ש
  • English letter are OK.

https://jsfiddle.net/sw9Lkcdy/4/

@heyyo-droid
Copy link

Any chance this default value respect_word_boundaries set to false, will be part of a release ?
We are using library coming from npmjs, they don't provide dev version.
https://www.npmjs.com/package/@selectize/selectize

@AndersFreund
Copy link

Found a related issue here with a solution to set respect_word_boundaries
Below will set respect_word_boundaries to default false.
Fixed it for me. Please let us know if a more elegant solution exists.

var getSearchOptions = Selectize.prototype.getSearchOptions;
Selectize.prototype.getSearchOptions = function () {
	var options = getSearchOptions.apply(this, arguments);
	options.respect_word_boundaries = false;
	return options;
};

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

@pspacek
Copy link

pspacek commented Jun 20, 2023

Issues don't magically fix themselves, do they? (That's reaction to the bot.)

@big-dream
Copy link

big-dream commented Oct 10, 2023

This problem also occurs when searching for Chinese. There are 一二三 in the options, and the option cannot be searched by typing .
example: https://codepen.io/big-dream-the-solid/pen/poqGWrB

My solution for this problem is to use an older version like: 4.6.9

@rcuhljr
Copy link

rcuhljr commented Nov 6, 2023

The example from @heyyo-droid also breaks with the english dash - character, if you have an item like Item - 3 it will filter out as soon as you type a dash.

Copy link
Contributor

github-actions bot commented Mar 6, 2024

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

@spacekpe
Copy link
Author

spacekpe commented Mar 7, 2024

Bot, this issue is still relevant

Copy link
Contributor

github-actions bot commented Jul 6, 2024

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days

@bplace
Copy link

bplace commented Jul 10, 2024

Hi, can't we just activate unicode support in the regular expression ?

See the initial example with u flag activated:

Capture d’écran 2024-07-10 à 10 40 00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants