Closed
Description
Elasticsearch version (bin/elasticsearch --version
): 6.3.0
Plugins installed: []
JVM version (java -version
): 1.8.0_171
OS version (uname -a
if on a Unix-like system): Darwin Kernel Version 17.7.0
Description of the problem including expected versus actual behavior:
I'm experiencing an issue with the word_delimiter_graph
token filter in combination with the pattern_capture
filter. I would expect a certain document to be indexed correctly, but I'm getting an illegal_argument_exception
.
I'm having the feeling this might be related to the following Lucene issue, but I'm not completely sure: LUCENE-8509
Steps to reproduce:
- Create index
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"filter": {
"remove_zero_padding": {
"type": "pattern_capture",
"patterns": [
"^0+(.*)"
]
},
"split_on_numerics": {
"type": "word_delimiter_graph",
"preserve_original": true,
"split_on_case_change": false,
"stem_english_possessive": false,
"split_on_numerics": true
}
},
"analyzer": {
"breaking_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"remove_zero_padding",
"split_on_numerics"
]
}
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"number": {
"type": "text",
"analyzer": "breaking_analyzer"
}
}
}
}
}
- Insert document
POST test_index/_doc
{
"number": "000ABCD"
}
- Error displayed
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=7,lastStartOffset=3 for field 'number'"
}
],
"type": "illegal_argument_exception",
"reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=7,lastStartOffset=3 for field 'number'"
},
"status": 400
}
Provide logs (if relevant):