Closed
Description
Currently 'lookup_begins()' in the FST manager only implements the Latin unicode range in its Regex, for performance reasons. It has been found at scale that queries take 20% more time if we support a wider alphabet in the regex (not sure why!).
We should map all Unicode ranges per script and use the first letter of the suggest word to build the regex that matches the provided word alphabet.
LOOKUP_REGEX_RANGE_LATIN
will have siblings: LOOKUP_REGEX_RANGE_CYRILLIC
, etc.
Use the following range database: http://kourge.net/projects/regexp-unicode-block
Activity