Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add chinese and english analyzer with refactor jieba tokenizer #37494

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aoiasd
Copy link
Contributor

@aoiasd aoiasd commented Nov 7, 2024

relate: #35853

@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label Nov 7, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/feature Issues related to feature request from users labels Nov 7, 2024
Copy link
Contributor

mergify bot commented Nov 7, 2024

@aoiasd E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 7, 2024

@aoiasd go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 7, 2024

@aoiasd cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 7, 2024

@aoiasd E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 7, 2024

@aoiasd cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 7, 2024

@aoiasd go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 7, 2024

@aoiasd cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 7, 2024

@aoiasd E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 7, 2024

@aoiasd go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 7, 2024

@aoiasd E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link

codecov bot commented Nov 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.53%. Comparing base (2630717) to head (2053502).
Report is 14 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #37494       +/-   ##
===========================================
+ Coverage   68.15%   78.53%   +10.37%     
===========================================
  Files         290     1349     +1059     
  Lines       25392   189210   +163818     
===========================================
+ Hits        17306   148592   +131286     
- Misses       8086    35233    +27147     
- Partials        0     5385     +5385     
Components Coverage Δ
Client 61.25% <ø> (∅)
Core 68.07% <ø> (-0.09%) ⬇️
Go 80.76% <ø> (∅)
Files with missing lines Coverage Δ
internal/util/function/bm25_function.go 80.00% <ø> (ø)

... and 1066 files with indirect coverage changes

@aoiasd
Copy link
Contributor Author

aoiasd commented Nov 8, 2024

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 8, 2024

@aoiasd E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@aoiasd
Copy link
Contributor Author

aoiasd commented Nov 8, 2024

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 8, 2024

@aoiasd E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@aoiasd
Copy link
Contributor Author

aoiasd commented Nov 8, 2024

/run-cpu-e2e

1 similar comment
@aoiasd
Copy link
Contributor Author

aoiasd commented Nov 8, 2024

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 8, 2024

@aoiasd E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@aoiasd
Copy link
Contributor Author

aoiasd commented Nov 8, 2024

/run-cpu-e2e

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aoiasd
To complete the pull request process, please assign congqixia after the PR has been reviewed.
You can assign the PR to them by writing /assign @congqixia in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

mergify bot commented Nov 11, 2024

@aoiasd go-sdk check failed, comment rerun go-sdk can trigger the job again.

@aoiasd
Copy link
Contributor Author

aoiasd commented Nov 12, 2024

relate: #37419

@zhengbuqian
Copy link
Collaborator

/lgtm

thanks!

@aoiasd
Copy link
Contributor Author

aoiasd commented Nov 12, 2024

rerun ut

@sre-ci-robot sre-ci-robot removed the lgtm label Nov 12, 2024
@aoiasd aoiasd force-pushed the doc-in-tokenizer-3 branch 2 times, most recently from 9ce208a to 98cfda8 Compare November 12, 2024 09:08
@sre-ci-robot sre-ci-robot added the area/dependency Pull requests that update a dependency file label Nov 12, 2024
Copy link
Contributor

mergify bot commented Nov 12, 2024

@aoiasd E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

}

pub(crate) fn get_stop_words_list(str_list:Vec<String>) -> Vec<String>{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use &[String] as parameter may be better?

Copy link
Contributor Author

@aoiasd aoiasd Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct function param passing in rust means pass owner of struct to function, not memory copy, in this function means the passing struct will be free when get_stop_words_list finish.

@chyezh
Copy link
Contributor

chyezh commented Nov 13, 2024

/lgtm

@@ -127,30 +157,34 @@ impl AnalyzerBuilder<'_>{
// build with filter if filter param exist
builder=self.build_filter(builder, value)?;
},
"max_token_length" => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokenizer max token length in es means split long token not remove long token, but now we don't have same function, so remove this option.

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
@sre-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@mergify mergify bot removed the ci-passed label Nov 13, 2024
Copy link
Contributor

mergify bot commented Nov 13, 2024

@aoiasd E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@aoiasd
Copy link
Contributor Author

aoiasd commented Nov 13, 2024

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 13, 2024

@aoiasd E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@aoiasd
Copy link
Contributor Author

aoiasd commented Nov 13, 2024

/run-cpu-e2e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dependency Pull requests that update a dependency file dco-passed DCO check passed. kind/feature Issues related to feature request from users size/L Denotes a PR that changes 100-499 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants