Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support insert texts into Milvus directly and use BM25 to search #35853

Open
1 task done
zhengbuqian opened this issue Aug 30, 2024 · 1 comment
Open
1 task done
Assignees
Labels
kind/feature Issues related to feature request from users

Comments

@zhengbuqian
Copy link
Collaborator

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

Currently in order to perform BM25 based text relevance search using vector ANN search, we have to :

  1. Gather the entire corpus to collect the data statistics, including term frequency and inverse document frequency, etc
  2. Compute the doc embeddings and insert those into Milvus as sparse embeddings
  3. Compute the query embeddings and search using IP metric

This approach is not good enough and hard to update when the corpus has been updated a lot.

We propose a new way of doing such: allowing inserting texts only and have Milvus to maintain the statistics and do the conversion at runtime.

Proposed approaches and APIs will be shared shortly.

This will be the umbrella issue for all related following issues and PRs.

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

@zhengbuqian zhengbuqian added the kind/feature Issues related to feature request from users label Aug 30, 2024
@xiaofan-luan
Copy link
Collaborator

excited about this!

sre-ci-robot pushed a commit that referenced this issue Sep 12, 2024
relate: #35853
Support create collection with functions. Prepare for support bm25
function.

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Sep 19, 2024
…36036)

relate: #35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Oct 11, 2024
relate: #35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
sre-ci-robot pushed a commit to milvus-io/pymilvus that referenced this issue Oct 11, 2024
currently only BM25 Function is supported.

issue: milvus-io/milvus#35853

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Oct 12, 2024
issue: #35853 and
#35856

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Oct 13, 2024
issue: #35853

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Oct 13, 2024
issue: #35853

* BM25 Function now takes no params, k1, b should be passed via index
params
* support BM25 full text search when metric type is not present in
search request
* add more strict validation with functions at collection creation time

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Oct 14, 2024
relate: #35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Oct 16, 2024
issue: #35853

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Oct 16, 2024
…5 field exist. (#36886)

relate: #35853

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Oct 16, 2024
issue: #36883,
#35853

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Nov 6, 2024
…37048)

relate: #35853
#36751

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Nov 7, 2024
issue: #35853

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Nov 10, 2024
relate: #35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
congqixia added a commit to congqixia/milvus that referenced this issue Nov 11, 2024
Related to milvus-io#35853

This PR contains following changes:

- Add function and related proto and helper functions
- Remove the insert column missing check and leave it to server
- Add text as search input data
- Add some unit tests for logic above

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Nov 12, 2024
Related to #35853

This PR contains following changes:

- Add function and related proto and helper functions
- Remove the insert column missing check and leave it to server
- Add text as search input data
- Add some unit tests for logic above

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Nov 14, 2024
…37494)

relate: #35853

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Nov 14, 2024
relate: #35853

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
sre-ci-robot pushed a commit that referenced this issue Nov 14, 2024
relate: #35853

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Issues related to feature request from users
Projects
None yet
Development

No branches or pull requests

3 participants