Skip to content

[Bug]: [Urgent] Failed to LoadSegment because index file index_null_offset is missing #39881

Open
@Andy6132024

Description

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.5.4
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): kafka 
- SDK version(e.g. pymilvus v2.0.0rc2): 2.5.4
- OS(Ubuntu or CentOS): RockyLinux
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

After upgrading from Milvus v2.5.3 to v.2.5.4, one collection in particular got stuck at loading. Unfortunately this collection holds very important data and it has become a blocker for our devs.

The querynode logs indicate that an index file named index_null_offset was missing.

[2025/02/13 10:51:10.700 +00:00] [WARN] [cluster/worker.go:105] ["failed to call LoadSegments via grpc worker"] [traceID=e8990d64d2ae223d2f828c7fae6fa364] [workerID=2197] [error="At LoadSegment: Error in GetObjectSize[errcode:404, exception:, errmessage:No response body., params:params, bucket=milvus-bucket, object=file/index_files/455538097355005125/0/454664253305745956/455538097355005123/index_null_offset]"]

Searched existing issues and found one very similar issue but it seems this issue was not fully resolved in v2.5.4 (tried dropping and re-creating the indexes but no avail). Another related issue is this one which also reported missing some files in the index_files folder.

The collection which got stuck at loading leverages the new feature Full Text Search in v2.5 which uses BM25 algorithm to automatically convert raw texts into sparse vectors. Not sure if this info might help you identify the root cause. Here's the pseudo code of its schema,

id = FieldSchema(
  name="id",
  dtype=DataType.VARCHAR,
  max_length=36,
  is_primary=True,
  auto_id=False
)
vector = FieldSchema(
  name="vector",
  dtype=DataType.FLOAT_VECTOR,
  dim=1536,
)
year_month = FieldSchema(
  name="year_month",
  dtype=DataType.INT64,
)
text = FieldSchema(
  name="text",
  dtype=DataType.VARCHAR,
  max_length=65535,
  enable_analyzer=True,
  enable_match=True
)
sparse_vector = FieldSchema(
  name="sparse_vector",
  dtype=DataType.SPARSE_FLOAT_VECTOR
)

bm25_function = Function(
    name="text_bm25_emb",
    input_field_names=["text"],
    output_field_names=["sparse_vector"],
    function_type=FunctionType.BM25,
)

schema = CollectionSchema(
  fields=[id, vector, year_month, text, sparse_vector],
  description="test",
  enable_dynamic_field=True,
  partition_key_field="year_month",
)

Expected Behavior

Collection is loaded successfully after upgrading to v2.5.4

Steps To Reproduce

Created a collection in v2.5.3 or lower using the schema above and then upgrade Milvus to v2.5.4. Check if the collection can be loaded.

Milvus Log

No response

Anything else?

No response

Metadata

Labels

kind/bugIssues or changes related a bugtriage/needs-informationIndicates an issue needs more information in order to work on it.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions