Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修复startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards #206

Merged
merged 2 commits into from
Jun 5, 2019

Conversation

abia321
Copy link
Contributor

@abia321 abia321 commented May 31, 2019

当将参数“ignore_pinyin_offset”设置为false后,并向pinyin分词字段批量写入数据,即会出现“startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards”异常。
测试校验发现为在reset()时,应该同样将this.processedSortCandidate = false,即可修复此问题。

abia321 and others added 2 commits May 30, 2019 14:55
…t, and offsets must not go backwards修复

当将参数“ignore_pinyin_offset”设置为false后,并向pinyin分词字段批量写入数据,即会出现“startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards”异常。测试校验发现为在reset()时,应该同样将this.processedSortCandidate = false,即可修复此问题。
Copy link

@wqmain wqmain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
依旧报错啊,6.x 的 master 版本,maven编译打包出来6.3.0zip文件:elasticsearch-analysis-pinyin-6.3.0.zip

@oIdmonk
Copy link

oIdmonk commented Jul 29, 2020

这个bug并么有解决。。。。

@medcl
Copy link
Member

medcl commented Jul 29, 2020

@oIdmonk ,复现步骤麻烦提供一下,谢谢。

@oIdmonk
Copy link

oIdmonk commented Jul 29, 2020 via email

@medcl
Copy link
Member

medcl commented Jul 29, 2020 via email

@oIdmonk
Copy link

oIdmonk commented Jul 29, 2020 via email

@medcl
Copy link
Member

medcl commented Jul 30, 2020

elasticsearch-analysis-pinyin-7.2.0.zip
@oIdmonk 试试这个包。

@bashen1291
Copy link

bashen1291 commented Jul 30, 2020

7.5.1 问题依旧存在, 具体触发情景为:如果文本是以英文加分词开头的,index会出现 lastStartOffset 前移的情况。
复现方式, 使用readme中的例子, 并配置 ignore_pinyin_offset=false

GET /medcl/_analyze
{
  "text": ["liu 德华"],
  "analyzer": "pinyin_analyzer"
}

结果:

{
    "tokens": [
        {
            "token": "liu",
            "start_offset": 1,
            "end_offset": 4,
            "type": "word",
            "position": 0
        },
        {
            "token": "liu 德华",
            "start_offset": 0,
            "end_offset": 6,
            "type": "word",
            "position": 0
        },
       
    ]
}

@medcl

@huanghui-liao
Copy link

image

7.9.3版本出现

@liqunlin
Copy link

liqunlin commented Oct 21, 2021

ES6.8.8、ik6.8.8、pinyin6.8.8,写入索引数据报同样错误,数据可以写入,但是1000条会丢掉1条。请问使用哪个ik版本和pinyin版本可以解决?
org.elasticsearch.hadoop.rest.EsHadoopRemoteException: illegal_argument_exception: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=6,endOffset=7,lastStartOffset=7 for field 'content'
@medcl @abia321

@ytk929
Copy link

ytk929 commented Jan 17, 2022

ES6.8.8、ik6.8.8、pinyin6.8.8,写入索引数据报同样错误,数据可以写入,但是1000条会丢掉1条。请问使用哪个ik版本和pinyin版本可以解决? org.elasticsearch.hadoop.rest.EsHadoopRemoteException: illegal_argument_exception: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=6,endOffset=7,lastStartOffset=7 for field 'content' @medcl @abia321

ik 6.6.1 应该可以,不用改es版本,解压ik后修改里面描述文件中es的版本做下适配

@54huige
Copy link

54huige commented Jan 20, 2022

ES6.8.8、ik6.8.8、pinyin6.8.8,写入索引数据报同样错误,数据可以写入,但是1000条会丢掉1条。请问使用哪个ik版本和pinyin版本可以解决? org.elasticsearch.hadoop.rest.EsHadoopRemoteException: illegal_argument_exception: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=6,endOffset=7,lastStartOffset=7 for field 'content' @medcl @abia321

@liqunlin
解决了吗,我这边7.16.2的ES、ik、pinyin版本还是这个问题

@ytk929 大佬我这边的版本怎么还有这么个问题。貌似很新的样子

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants