Description
Elasticsearch version :
7.3.2 dockerized
image: docker.elastic.co/elasticsearch/elasticsearch:7.3.2
Plugins installed: none
JVM version (java -version
):
openjdk version "12.0.2" 2019-07-16
OpenJDK Runtime Environment (build 12.0.2+10)
OpenJDK 64-Bit Server VM (build 12.0.2+10, mixed mode, sharing)
OS version (uname -a
if on a Unix-like system):
Linux 7f94601adc38 4.20.7-042007-generic #201902061234 SMP Wed Feb 6 17:36:40 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
I'm using a predicate_token_filter to keep only the first X token of a stream. For this I use this filter configuration :
{ "myPredicatefilter": { "type": "predicate_token_filter", "script": { "source": "token.getPosition() <= 1" } } }
But every time I use this analyzer it seems the position of the tokens are increasing, and after a few calls, the filter does not produce any token.
Here is a video showing the problem :
Steps to reproduce:
Complete index settings :
PUT issue-predicate-token-filter
{
"settings": {
"analysis": {
"filter": {
"myPredicatefilter": {
"type": "predicate_token_filter",
"script": {
"source": "token.getPosition() <= 1"
}
}
},
"analyzer": {
"myPredicateAnalyzer": {
"filter": [
"myPredicatefilter"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
}
}
}
analyze request :
POST issue-predicate-token-filter/_analyze
{
"analyzer": "myPredicateAnalyzer",
"text": "pain grillé"
}
first result :
{
"tokens" : [
{
"token" : "pain",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 0
},
{
"token" : "grillé",
"start_offset" : 5,
"end_offset" : 11,
"type" : "word",
"position" : 1
}
]
}
second result and all call afterward :
{
"tokens" : [ ]
}
The analyzer can work again for a call after a _close / _open in the index. And also if I use explain : true
in the analyze request the analyzer works without any problem.
POST issue-predicate-token-filter/_analyze
{
"analyzer": "myPredicateAnalyzer",
"text": "pain grillé",
"explain": true
}
You can see the weird behavior by adding a Debug.explain in the filter script
PUT issue-predicate-token-filter-with-debug
{
"settings": {
"analysis": {
"filter": {
"myPredicatefilter": {
"type": "predicate_token_filter",
"script": {
"source": "Debug.explain(token.getPosition())"
}
}
},
"analyzer": {
"myPredicateAnalyzer": {
"filter": [
"myPredicatefilter"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
}
}
}
POST issue-predicate-token-filter-with-debug/_analyze
{
"analyzer": "myPredicateAnalyzer",
"text": "pain grillé",
"explain": false
}
You will see the token.getPosition() value increasing after each call.