Skip to content

[Feature Request]: Use NLP word breaker in fulltext query. #21774

@fengttt

Description

@fengttt

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

At this moment we used a simple space/punctuation breaking for latin and 3-gram
for CJK in BOTH indexing time and query time.

Sometimes it is more desirable to use a NLP word breaker.

Describe the feature you'd like

Our word breaker is pluggable. We can use any word breaker in either indexing time or query time.

For Chinese, Chinese, for example, https://github.com/yanyiwu/gojieba

There are both advantages and disadvantages to use a NLP word breaker. I would suggest at this moment, we keep the simple 3-gram breaker at indexing time. For query time, esp when match in natural language mode, we should consider using NLP word breaker.

Describe implementation you've considered

No response

Documentation, Adoption, Use Case, Migration Strategy

Additional information

No response

Metadata

Metadata

Assignees

Labels

kind/featurepriority/p0Critical feature that should be implemented in this version
No fields configured for Feature.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions