Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: When function is added in schema, the insert interface will call describe collection, resulting in a describe collection call every time an insert is performed. #37622

Open
1 task done
zhuwenxing opened this issue Nov 12, 2024 · 8 comments
Assignees
Labels
feature/full text search kind/bug Issues or changes related a bug priority/urgent Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

zhuwenxing commented Nov 12, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc118
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

image

Expected Behavior

collection info should be cached in client

Steps To Reproduce

No response

Milvus Log

from pymilvus import (connections, Collection, FieldSchema, CollectionSchema, DataType, FunctionType, list_collections,
                      Function)
from loguru import logger
import time
from faker import Faker
faker = Faker()
connections.connect(host = "10.104.16.127")

collection_name = "demo_collection"
analyzer_params = {
    "tokenizer": "standard"
}
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=25536,
                enable_analyzer=True, analyzer_params=analyzer_params, enable_match=True),
    FieldSchema(name="sparse", dtype=DataType.SPARSE_FLOAT_VECTOR),
]
schema = CollectionSchema(fields=fields, description="beir test collection")
bm25_function = Function(
    name="text_bm25_emb",
    function_type=FunctionType.BM25,
    input_field_names=["text"],
    output_field_names=["sparse"],
    params={},
)
schema.add_function(bm25_function)
collection = Collection(collection_name, schema)

collection.create_index(
    "sparse",
    {
        "index_type": "SPARSE_INVERTED_INDEX",
        "metric_type": "BM25",
        "params": {
            "bm25_k1": 1.5,
            "bm25_b": 0.75,
        }
    }
)
collection.load()
logger.info("Collection setup completed successfully")

batch_size = 10
data = [
    {
        "id": int(time.time() * (10 ** 6)),
        "text": faker.text(max_nb_chars=300),
    }
    for _ in range(batch_size)
]
for i in range(1000):
    collection.insert(data)
    print(f"Inserting {batch_size} vectors")

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 12, 2024
@zhuwenxing zhuwenxing added this to the 2.5.0 milestone Nov 12, 2024
@zhuwenxing
Copy link
Contributor Author

/assign @zhengbuqian

@xiaofan-luan
Copy link
Collaborator

mark this as critical

@xiaofan-luan xiaofan-luan added the priority/urgent Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 12, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 13, 2024
@yanliang567
Copy link
Contributor

also reproduce to schema without Function defined

@zhengbuqian
Copy link
Collaborator

/unassign

/assign @XuanYang-cn

assigning to XuanYang-cn as this is also reproduced when without Function

@yanliang567
Copy link
Contributor

/assign @zhengbuqian
sorry for false alarm, it only reproduces with Function.

@zhengbuqian
Copy link
Collaborator

https://github.com/milvus-io/pymilvus/blob/5ec0a601bb6683c787d9bdb88ee35f1c49a49586/pymilvus/client/grpc_handler.py#L484-L516

_prepare_row_insert_request checks if schema is dict, if not it tries to get the schema from the server.

when insert_rows calls _prepare_row_insert_request, schema was not passed in the argument list. actually if a timeout is provided to insert_rows, timeout will be passed to the schema argument of _prepare_row_insert_request.

That is why if not isinstance(schema, dict): is always true, and pymilvus always call describe_collection RPC of each insert.

sre-ci-robot pushed a commit to milvus-io/pymilvus that referenced this issue Nov 14, 2024
issue: milvus-io/milvus#37622

Signed-off-by: Ubuntu <ubuntu@ip-10-15-107-238.us-west-2.compute.internal>
Co-authored-by: Ubuntu <ubuntu@ip-10-15-107-238.us-west-2.compute.internal>
@zhengbuqian
Copy link
Collaborator

/unassign

should be fixed by milvus-io/pymilvus#2347 (master branch) or milvus-io/pymilvus#2348 (2.4 branch)

sorry for false alarm, it only reproduces with Function.

@yanliang567 see the fix, the bug is not related Function and should reproduce without Function

@XuanYang-cn
Copy link
Contributor

@yanliang567 Seems insert by rows always will describe collection
/assign @zhuwenxing
/unassign
Please help verify, THX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/full text search kind/bug Issues or changes related a bug priority/urgent Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants