Closed
Description
7.3.1 (bin/elasticsearch --version
):
Plugins installed: []
Embedded Java 11 (java -version
):
OS version (uname -a
if on a Unix-like system):
Description: When using the _analyze api endpoint on an index with normalizer defined in the index settings. The output is coming from the default analyzer instead of the normalizer under test,
Steps to reproduce:
Create a test index
PUT word_delimiter_test
{
"settings": {
"analysis": {
"char_filter": {
"filter_noisy_characters": {
"pattern": "[.-:\"]",
"type": "pattern_replace",
"replacement": " "
},
"convert_dots": {
"flags": "CASE_INSENSITIVE",
"pattern": "\\.(net|js|io)",
"type": "pattern_replace",
"replacement": "dot$1"
}
},
"filter": {
"word_delimiter": {
"split_on_numerics": true,
"generate_word_parts": true,
"generate_number_parts": true,
"catenate_all": true,
"type": "word_delimiter_graph",
"type_table": [
"# => ALPHA",
"+ => ALPHA"
]
},
"synonym": {
"type": "synonym_graph",
"synonyms": [
"casp, comptia advanced security practitioner"
]
}
},
"analyzer": {
"test": {
"char_filter": [
"convert_dots"
],
"tokenizer": "whitespace",
"filter": [
"lowercase",
"synonym",
"word_delimiter",
"flatten_graph"
]
}
},
"normalizer": {
"languages_normalizer": {
"filter": [
"trim"
],
"type": "custom",
"char_filter": [
"convert_dots",
"filter_noisy_characters"
]
}
}
}
}
}```
Then run an _analyze endpoint to test the normalizer
GET word_delimiter_test/_analyze
{
"text": "Wi-fi",
"normalizer": "languages_normalizer"
}
Expected output should be "wifi"
But the output is analyzed
```json
{
"tokens" : [
{
"token" : "wi",
"start_offset" : 0,
"end_offset" : 2,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "fi",
"start_offset" : 3,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}