-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Spanish analyzer not normalizing all accented words. #1956
Comments
I've found where the issue comes from. Looking at bleve/analysis/lang/es/light_stemmer_es.go, the normalization of accented letters only happens if the input is larger than 5 characters, something that neither |
PR opened addressing this issue: #1957 |
Thanks for raising the pull request @svera . The team will review soon. |
Normalization of accented letters only happens if the input is larger than 5 characters, something that, for example, neither `guía` nor `fría` comply. The solution would be to always execute the accented characters normalization, by moving it to a separate file just like it is done in the german analyzer. Fixes: #1956
Hi,
I've been working with the spanish analyzer in order to index documents in spanish, when I found and issue with some accented words which are not normalised as they should be. This is easily reproducible using the Bleve text analysis wizard, choosing the
es
analyzer and puttingfría
orguía
in the Text to analyze input box. The words are kept as they are, however, using plural forms of these words (guías
andfrías
) works as expected.Other accented words such as
tentación
orcomeré
are correctly stemmed tocoleccion
andcomer
, respectively.The text was updated successfully, but these errors were encountered: