Skip to content

FST lookup should support more scripts #64

Closed
@valeriansaliou

Description

@valeriansaliou

Currently 'lookup_begins()' in the FST manager only implements the Latin unicode range in its Regex, for performance reasons. It has been found at scale that queries take 20% more time if we support a wider alphabet in the regex (not sure why!).

We should map all Unicode ranges per script and use the first letter of the suggest word to build the regex that matches the provided word alphabet.

LOOKUP_REGEX_RANGE_LATIN will have siblings: LOOKUP_REGEX_RANGE_CYRILLIC, etc.

Use the following range database: http://kourge.net/projects/regexp-unicode-block

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

enhancementEnhancement to an existing feature

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions