Skip to content

MilvusStore.upsert() reports success but writes are not durable on remote Milvus 2.5+ (missing flush) #534

@Ronenzu

Description

@Ronenzu

Summary

MilvusStore.upsert() returns len(chunks) as a fallback success count even when the underlying upsert produced no durable rows in Milvus. The memsearch CLI then reports e.g. Indexed 219 chunks while get_collection_stats() shows row_count: 0. All subsequent searches return empty results because nothing was actually queryable.

The issue surfaces specifically against remote Milvus 2.5+ standalone. Milvus-lite (the embedded path) appears to auto-flush, masking the bug.

Environment

  • memsearch: 0.4.2 (latest as of 2026-05-09, installed via claude plugin install memsearch)
  • Milvus: milvusdb/milvus:v2.5.4 standalone (Docker, embedded etcd)
  • pymilvus: bundled with memsearch
  • Client: Windows 11 (uvx installed) connecting to remote Milvus on Linux VM via milvus.uri = "http://<vm>:19530"
  • Embeddings: embedding.provider = "onnx" (all-MiniLM-L6-v2, dim=384)

Reproduction

  1. Run a remote Milvus 2.5.x standalone:

    docker run -d --name milvus -p 19530:19530 -p 9091:9091 \
      -e ETCD_USE_EMBED=true -e ETCD_DATA_DIR=/var/lib/milvus/etcd \
      -e ETCD_CONFIG_PATH=/milvus/configs/embedEtcd.yaml \
      -e COMMON_STORAGETYPE=local \
      -v $PWD/milvus/volumes/milvus:/var/lib/milvus \
      -v $PWD/milvus/embedEtcd.yaml:/milvus/configs/embedEtcd.yaml \
      milvusdb/milvus:v2.5.4 milvus run standalone
  2. Configure memsearch (~/.memsearch/config.toml):

    [milvus]
    uri = "http://<vm-ip>:19530"
    
    [embedding]
    provider = "onnx"
  3. Index a project:

    memsearch index <project-path>
    # Output: Indexed 219 chunks across 47 files (or similar)
  4. Inspect Milvus:

    from pymilvus import MilvusClient
    c = MilvusClient(uri='http://<vm-ip>:19530')
    print(c.get_collection_stats('memsearch_chunks'))
    # Actual:   {'row_count': 0}
    # Expected: {'row_count': 219}
  5. Search returns nothing:

    memsearch search "anything"
    # No matches found.

Evidence the bug is in memsearch, not Milvus or pymilvus

I ran a direct minimal pymilvus test against the same Milvus instance, with the same schema memsearch uses (BM25 sparse function + 384-dim FLOAT_VECTOR + COSINE + SPARSE_INVERTED_INDEX):

from pymilvus import MilvusClient, DataType, Function, FunctionType
client = MilvusClient(uri='http://<vm>:19530')
# (schema setup — same shape as memsearch._ensure_collection)
client.upsert(collection_name=COL, data=[{
    'chunk_hash': 'test1',
    'embedding': [0.1] * 384,
    'content': 'hello world',
    'file_path': 'test.md',
    'chunk_index': 0,
    'tags': '',
    'metadata': '{}'
}])
client.flush(COL)
print(client.get_collection_stats(COL))

Result: {'row_count': 1}. Direct path works. The minimal change vs. memsearch is the explicit flush() call.

Root cause

memsearch/store.py:125-137:

def upsert(self, chunks):
    if not chunks:
        return 0
    result = self._client.upsert(collection_name=self._collection, data=chunks)
    return result.get(\"upsert_count\", len(chunks)) if isinstance(result, dict) else len(chunks)

Two issues:

  1. No flush() after upsert. Milvus 2.5+ does not auto-flush in the standalone path. Data sits in the in-memory write buffer and is invisible to queries until a flush, segment seal, or compaction. Older Milvus versions (and milvus-lite) auto-flushed for small writes — masking this gap.

  2. Silent fallback to len(chunks). If Milvus returns a dict without upsert_count (e.g. on the 2.5+ schema-mismatch error path some configs hit), the function still returns the requested count as success. core.py:194 (return self._store.upsert(records)) then reports it back to the user. There's no integrity check that row_count increased.

Suggested fix

def upsert(self, chunks):
    if not chunks:
        return 0
    result = self._client.upsert(collection_name=self._collection, data=chunks)
    self._client.flush(self._collection)  # Required for Milvus 2.5+ standalone

    # Trust upsert_count over len(chunks) — fail loud if the server says 0
    actual = result.get(\"upsert_count\") if isinstance(result, dict) else None
    if actual is None:
        actual = len(chunks)
    if actual == 0 and len(chunks) > 0:
        raise RuntimeError(f\"Upsert reported 0 writes for {len(chunks)} chunks (silent failure)\")
    return actual

Optional follow-up: a periodic batch flush (every N chunks or on index completion) instead of per-call, to amortize the cost on large indexing runs.

Impact

This is silent data loss from the user's perspective — the CLI reports success, the ~/.memsearch/<project>/chunks/ markdown layer is intact, but the vector index is empty so all searches return nothing. Diagnosis required reading pymilvus server logs to confirm zero Insert/Upsert RPCs were ever executed, despite memsearch's claim of 219 successful writes.

Workaround for users

Until fixed, run client.flush('memsearch_chunks') manually after every memsearch index call.


Filed by a Claude Code user debugging memsearch + remote Milvus integration. Happy to test a patch if you'd like.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions