Skip to content

word_delimiter_graph filter not working in combination with pattern_capture filter #34741

Closed
@derkcrezee

Description

@derkcrezee

Elasticsearch version (bin/elasticsearch --version): 6.3.0

Plugins installed: []

JVM version (java -version): 1.8.0_171

OS version (uname -a if on a Unix-like system): Darwin Kernel Version 17.7.0

Description of the problem including expected versus actual behavior:
I'm experiencing an issue with the word_delimiter_graph token filter in combination with the pattern_capture filter. I would expect a certain document to be indexed correctly, but I'm getting an illegal_argument_exception.

I'm having the feeling this might be related to the following Lucene issue, but I'm not completely sure: LUCENE-8509

Steps to reproduce:

  1. Create index
PUT /test_index
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "remove_zero_padding": {
            "type": "pattern_capture",
            "patterns": [
              "^0+(.*)"
            ]
          },
          "split_on_numerics": {
            "type": "word_delimiter_graph",
            "preserve_original": true,
            "split_on_case_change": false,
            "stem_english_possessive": false,
            "split_on_numerics": true
          }
        },
        "analyzer": {
          "breaking_analyzer": {
            "type": "custom",
            "tokenizer": "keyword",
            "filter": [
              "remove_zero_padding",
              "split_on_numerics"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "number": {
          "type": "text",
          "analyzer": "breaking_analyzer"
        }
      }
    }
  }
}
  1. Insert document
POST test_index/_doc
{
  "number": "000ABCD"
}
  1. Error displayed
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=7,lastStartOffset=3 for field 'number'"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=7,lastStartOffset=3 for field 'number'"
  },
  "status": 400
}

Provide logs (if relevant):

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions