Is there an existing issue for the same feature request?
Is your feature request related to a problem?
At this moment we used a simple space/punctuation breaking for latin and 3-gram
for CJK in BOTH indexing time and query time.
Sometimes it is more desirable to use a NLP word breaker.
Describe the feature you'd like
Our word breaker is pluggable. We can use any word breaker in either indexing time or query time.
For Chinese, Chinese, for example, https://github.com/yanyiwu/gojieba
There are both advantages and disadvantages to use a NLP word breaker. I would suggest at this moment, we keep the simple 3-gram breaker at indexing time. For query time, esp when match in natural language mode, we should consider using NLP word breaker.
Describe implementation you've considered
No response
Documentation, Adoption, Use Case, Migration Strategy
Additional information
No response
Is there an existing issue for the same feature request?
Is your feature request related to a problem?
Describe the feature you'd like
Our word breaker is pluggable. We can use any word breaker in either indexing time or query time.
For Chinese, Chinese, for example, https://github.com/yanyiwu/gojieba
There are both advantages and disadvantages to use a NLP word breaker. I would suggest at this moment, we keep the simple 3-gram breaker at indexing time. For query time, esp when match in natural language mode, we should consider using NLP word breaker.
Describe implementation you've considered
No response
Documentation, Adoption, Use Case, Migration Strategy
Additional information
No response