Description
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: v2.5.4
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2): 2.5.4
- OS(Ubuntu or CentOS): RockyLinux
- CPU/Memory:
- GPU:
- Others:
Current Behavior
After upgrading from Milvus v2.5.3 to v.2.5.4, one collection in particular got stuck at loading. Unfortunately this collection holds very important data and it has become a blocker for our devs.
The querynode logs indicate that an index file named index_null_offset
was missing.
[2025/02/13 10:51:10.700 +00:00] [WARN] [cluster/worker.go:105] ["failed to call LoadSegments via grpc worker"] [traceID=e8990d64d2ae223d2f828c7fae6fa364] [workerID=2197] [error="At LoadSegment: Error in GetObjectSize[errcode:404, exception:, errmessage:No response body., params:params, bucket=milvus-bucket, object=file/index_files/455538097355005125/0/454664253305745956/455538097355005123/index_null_offset]"]
Searched existing issues and found one very similar issue but it seems this issue was not fully resolved in v2.5.4 (tried dropping and re-creating the indexes but no avail). Another related issue is this one which also reported missing some files in the index_files
folder.
The collection which got stuck at loading leverages the new feature Full Text Search in v2.5 which uses BM25 algorithm to automatically convert raw texts into sparse vectors. Not sure if this info might help you identify the root cause. Here's the pseudo code of its schema,
id = FieldSchema(
name="id",
dtype=DataType.VARCHAR,
max_length=36,
is_primary=True,
auto_id=False
)
vector = FieldSchema(
name="vector",
dtype=DataType.FLOAT_VECTOR,
dim=1536,
)
year_month = FieldSchema(
name="year_month",
dtype=DataType.INT64,
)
text = FieldSchema(
name="text",
dtype=DataType.VARCHAR,
max_length=65535,
enable_analyzer=True,
enable_match=True
)
sparse_vector = FieldSchema(
name="sparse_vector",
dtype=DataType.SPARSE_FLOAT_VECTOR
)
bm25_function = Function(
name="text_bm25_emb",
input_field_names=["text"],
output_field_names=["sparse_vector"],
function_type=FunctionType.BM25,
)
schema = CollectionSchema(
fields=[id, vector, year_month, text, sparse_vector],
description="test",
enable_dynamic_field=True,
partition_key_field="year_month",
)
Expected Behavior
Collection is loaded successfully after upgrading to v2.5.4
Steps To Reproduce
Created a collection in v2.5.3 or lower using the schema above and then upgrade Milvus to v2.5.4. Check if the collection can be loaded.
Milvus Log
No response
Anything else?
No response