Skip to content

Conversation

@Ryan19929
Copy link

修复 PinyinTokenFilter 排序标志未重置的问题

问题

使用 keyword + pinyin filter 组合时(包含pinyin filter即可),第一次分词结果与之后执行的结果顺序不一致。

  • 对 [GT40] 分词
  "ignore_pinyin_offset": "true",
  "keep_first_letter": "false",
  "keep_none_chinese_in_joined_full_pinyin": "false",
  "keep_none_chinese_together": "true",
  "keep_original": "true",
  "limit_first_letter_length": 16,
  "lowercase": "true",
  "remove_duplicated_term": "false",
  "type": "pinyin"
  • 第一次检索
{
  "tokens": [
    {
      "token": "g",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "gt40",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "t",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 1
    },
    {
      "token": "40",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 2
    }
  ]
}
  • 之后的测试
{
  "tokens": [
    {
      "token": "g",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "t",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 1
    },
    {
      "token": "40",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 2
    },
    {
      "token": "gt40",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 2
    }
  ]
}

原因

PinyinTokenFilter.resetVariable() 中没有重置 processedSortCandidate

修复

resetVariable() 中添加 this.processedSortCandidate = false;

@medcl
Copy link
Member

medcl commented Dec 9, 2025

@kin122 please help to review this PR, thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants