-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add chinese and english analyzer with refactor jieba tokenizer #37494
base: master
Are you sure you want to change the base?
Conversation
@aoiasd E2e jenkins job failed, comment |
@aoiasd go-sdk check failed, comment |
@aoiasd cpp-unit-test check failed, comment |
b6dcfc9
to
65cf5eb
Compare
@aoiasd E2e jenkins job failed, comment |
@aoiasd cpp-unit-test check failed, comment |
@aoiasd go-sdk check failed, comment |
65cf5eb
to
cae4b7d
Compare
@aoiasd cpp-unit-test check failed, comment |
@aoiasd E2e jenkins job failed, comment |
@aoiasd go-sdk check failed, comment |
cae4b7d
to
e544b5b
Compare
@aoiasd E2e jenkins job failed, comment |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #37494 +/- ##
===========================================
+ Coverage 68.15% 78.53% +10.37%
===========================================
Files 290 1349 +1059
Lines 25392 189210 +163818
===========================================
+ Hits 17306 148592 +131286
- Misses 8086 35233 +27147
- Partials 0 5385 +5385
|
/run-cpu-e2e |
@aoiasd E2e jenkins job failed, comment |
/run-cpu-e2e |
@aoiasd E2e jenkins job failed, comment |
/run-cpu-e2e |
1 similar comment
/run-cpu-e2e |
@aoiasd E2e jenkins job failed, comment |
/run-cpu-e2e |
e544b5b
to
5493bff
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: aoiasd The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@aoiasd go-sdk check failed, comment |
5493bff
to
1c1817e
Compare
relate: #37419 |
1c1817e
to
4a062c2
Compare
/lgtm thanks! |
rerun ut |
4a062c2
to
b5c1765
Compare
9ce208a
to
98cfda8
Compare
@aoiasd E2e jenkins job failed, comment |
98cfda8
to
fce7377
Compare
internal/core/thirdparty/tantivy/tantivy-binding/src/jieba_tokenizer.rs
Outdated
Show resolved
Hide resolved
} | ||
|
||
pub(crate) fn get_stop_words_list(str_list:Vec<String>) -> Vec<String>{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use &[String] as parameter may be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direct function param passing in rust means pass owner of struct to function, not memory copy, in this function means the passing struct will be free when get_stop_words_list finish.
/lgtm |
@@ -127,30 +157,34 @@ impl AnalyzerBuilder<'_>{ | |||
// build with filter if filter param exist | |||
builder=self.build_filter(builder, value)?; | |||
}, | |||
"max_token_length" => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokenizer max token length in es means split long token not remove long token, but now we don't have same function, so remove this option.
internal/core/thirdparty/tantivy/tantivy-binding/src/tokenizer.rs
Outdated
Show resolved
Hide resolved
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
fce7377
to
2053502
Compare
New changes are detected. LGTM label has been removed. |
@aoiasd E2e jenkins job failed, comment |
/run-cpu-e2e |
@aoiasd E2e jenkins job failed, comment |
/run-cpu-e2e |
relate: #35853