Skip to content

Elasticsearch: custom word-splitting index setting which works in 5.X fails with error in 6.X "illegal_argument_exception" #28474

Closed
@MorrieAtElastic

Description

@MorrieAtElastic

Describe the feature:

Elasticsearch version (bin/elasticsearch --version):6.1, 6.2

Plugins installed: []

JVM version (java -version):1.08

OS version (uname -a if on a Unix-like system): ubuntu14

Description of the problem including expected versus actual behavior:

The following index-setting and mapping script works in versions 5.3.X and 5.6.X but fails when run against V 6.1 and V6.2:

DELETE test
PUT test
{
	"settings": {
		"index": {
			"number_of_shards": 1,
			"number_of_replicas": 0,
			"analysis": {
				"filter": {
					"split_words": {
						"split_on_numerics": false,
						"generate_word_parts": true,
						"type": "word_delimiter",
						"preserve_original": true,
						"stem_english_possessive": false
					}
				},
				"analyzer": {
					"path": {
						"filter": [
							"split_words"
						],
						"tokenizer": "file_path_tokenizer"
					}
				},
				"tokenizer": {
					"file_path_tokenizer": {
						"reverse": "true",
						"type": "path_hierarchy"
					}
				}
			}
		}
	},
	"mappings": {
		"doc": {
			"dynamic": "false",
			"properties": {
				"name": {
					"type": "text",
					"analyzer": "path"
				}
			}
		}
	}
}

POST test/doc/1
{"name": "HKLM/SOFTWARE/soft"}

GET /test/_analyze
{
  "field": "name",
  "text": [
    "HKLM/SOFTWARE/soft"
  ]
}

Comments

  • the above script is actually a simplified version of the original script which reproduces the problem.

  • when run against V5 elasticsearch the POST command given in the above script will index the document; when run against V6.1 or V6.2 the same command returns the following error:

> {"name": "HKLM\\SOFTWARE\\soft"}'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=5,endOffset=18,lastStartOffset=14 for field 'name'"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=5,endOffset=18,lastStartOffset=14 for field 'name'"
  },
  "status" : 400
}

  • error does not reproduce if the "split_words" filter is removed from the setting clause.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions