Summary
MilvusStore.upsert() returns len(chunks) as a fallback success count even when the underlying upsert produced no durable rows in Milvus. The memsearch CLI then reports e.g. Indexed 219 chunks while get_collection_stats() shows row_count: 0. All subsequent searches return empty results because nothing was actually queryable.
The issue surfaces specifically against remote Milvus 2.5+ standalone. Milvus-lite (the embedded path) appears to auto-flush, masking the bug.
Environment
- memsearch:
0.4.2 (latest as of 2026-05-09, installed via claude plugin install memsearch)
- Milvus:
milvusdb/milvus:v2.5.4 standalone (Docker, embedded etcd)
- pymilvus: bundled with memsearch
- Client: Windows 11 (uvx installed) connecting to remote Milvus on Linux VM via
milvus.uri = "http://<vm>:19530"
- Embeddings:
embedding.provider = "onnx" (all-MiniLM-L6-v2, dim=384)
Reproduction
-
Run a remote Milvus 2.5.x standalone:
docker run -d --name milvus -p 19530:19530 -p 9091:9091 \
-e ETCD_USE_EMBED=true -e ETCD_DATA_DIR=/var/lib/milvus/etcd \
-e ETCD_CONFIG_PATH=/milvus/configs/embedEtcd.yaml \
-e COMMON_STORAGETYPE=local \
-v $PWD/milvus/volumes/milvus:/var/lib/milvus \
-v $PWD/milvus/embedEtcd.yaml:/milvus/configs/embedEtcd.yaml \
milvusdb/milvus:v2.5.4 milvus run standalone
-
Configure memsearch (~/.memsearch/config.toml):
[milvus]
uri = "http://<vm-ip>:19530"
[embedding]
provider = "onnx"
-
Index a project:
memsearch index <project-path>
# Output: Indexed 219 chunks across 47 files (or similar)
-
Inspect Milvus:
from pymilvus import MilvusClient
c = MilvusClient(uri='http://<vm-ip>:19530')
print(c.get_collection_stats('memsearch_chunks'))
# Actual: {'row_count': 0}
# Expected: {'row_count': 219}
-
Search returns nothing:
memsearch search "anything"
# No matches found.
Evidence the bug is in memsearch, not Milvus or pymilvus
I ran a direct minimal pymilvus test against the same Milvus instance, with the same schema memsearch uses (BM25 sparse function + 384-dim FLOAT_VECTOR + COSINE + SPARSE_INVERTED_INDEX):
from pymilvus import MilvusClient, DataType, Function, FunctionType
client = MilvusClient(uri='http://<vm>:19530')
# (schema setup — same shape as memsearch._ensure_collection)
client.upsert(collection_name=COL, data=[{
'chunk_hash': 'test1',
'embedding': [0.1] * 384,
'content': 'hello world',
'file_path': 'test.md',
'chunk_index': 0,
'tags': '',
'metadata': '{}'
}])
client.flush(COL)
print(client.get_collection_stats(COL))
Result: {'row_count': 1}. Direct path works. The minimal change vs. memsearch is the explicit flush() call.
Root cause
memsearch/store.py:125-137:
def upsert(self, chunks):
if not chunks:
return 0
result = self._client.upsert(collection_name=self._collection, data=chunks)
return result.get(\"upsert_count\", len(chunks)) if isinstance(result, dict) else len(chunks)
Two issues:
-
No flush() after upsert. Milvus 2.5+ does not auto-flush in the standalone path. Data sits in the in-memory write buffer and is invisible to queries until a flush, segment seal, or compaction. Older Milvus versions (and milvus-lite) auto-flushed for small writes — masking this gap.
-
Silent fallback to len(chunks). If Milvus returns a dict without upsert_count (e.g. on the 2.5+ schema-mismatch error path some configs hit), the function still returns the requested count as success. core.py:194 (return self._store.upsert(records)) then reports it back to the user. There's no integrity check that row_count increased.
Suggested fix
def upsert(self, chunks):
if not chunks:
return 0
result = self._client.upsert(collection_name=self._collection, data=chunks)
self._client.flush(self._collection) # Required for Milvus 2.5+ standalone
# Trust upsert_count over len(chunks) — fail loud if the server says 0
actual = result.get(\"upsert_count\") if isinstance(result, dict) else None
if actual is None:
actual = len(chunks)
if actual == 0 and len(chunks) > 0:
raise RuntimeError(f\"Upsert reported 0 writes for {len(chunks)} chunks (silent failure)\")
return actual
Optional follow-up: a periodic batch flush (every N chunks or on index completion) instead of per-call, to amortize the cost on large indexing runs.
Impact
This is silent data loss from the user's perspective — the CLI reports success, the ~/.memsearch/<project>/chunks/ markdown layer is intact, but the vector index is empty so all searches return nothing. Diagnosis required reading pymilvus server logs to confirm zero Insert/Upsert RPCs were ever executed, despite memsearch's claim of 219 successful writes.
Workaround for users
Until fixed, run client.flush('memsearch_chunks') manually after every memsearch index call.
Filed by a Claude Code user debugging memsearch + remote Milvus integration. Happy to test a patch if you'd like.
Summary
MilvusStore.upsert()returnslen(chunks)as a fallback success count even when the underlying upsert produced no durable rows in Milvus. ThememsearchCLI then reports e.g.Indexed 219 chunkswhileget_collection_stats()showsrow_count: 0. All subsequent searches return empty results because nothing was actually queryable.The issue surfaces specifically against remote Milvus 2.5+ standalone. Milvus-lite (the embedded path) appears to auto-flush, masking the bug.
Environment
0.4.2(latest as of 2026-05-09, installed viaclaude plugin install memsearch)milvusdb/milvus:v2.5.4standalone (Docker, embedded etcd)milvus.uri = "http://<vm>:19530"embedding.provider = "onnx"(all-MiniLM-L6-v2, dim=384)Reproduction
Run a remote Milvus 2.5.x standalone:
Configure memsearch (
~/.memsearch/config.toml):Index a project:
Inspect Milvus:
Search returns nothing:
Evidence the bug is in memsearch, not Milvus or pymilvus
I ran a direct minimal pymilvus test against the same Milvus instance, with the same schema memsearch uses (BM25 sparse function + 384-dim FLOAT_VECTOR + COSINE + SPARSE_INVERTED_INDEX):
Result:
{'row_count': 1}. Direct path works. The minimal change vs. memsearch is the explicitflush()call.Root cause
memsearch/store.py:125-137:Two issues:
No
flush()after upsert. Milvus 2.5+ does not auto-flush in the standalone path. Data sits in the in-memory write buffer and is invisible to queries until a flush, segment seal, or compaction. Older Milvus versions (and milvus-lite) auto-flushed for small writes — masking this gap.Silent fallback to
len(chunks). If Milvus returns a dict withoutupsert_count(e.g. on the 2.5+ schema-mismatch error path some configs hit), the function still returns the requested count as success.core.py:194(return self._store.upsert(records)) then reports it back to the user. There's no integrity check thatrow_countincreased.Suggested fix
Optional follow-up: a periodic batch flush (every N chunks or on
indexcompletion) instead of per-call, to amortize the cost on large indexing runs.Impact
This is silent data loss from the user's perspective — the CLI reports success, the
~/.memsearch/<project>/chunks/markdown layer is intact, but the vector index is empty so all searches return nothing. Diagnosis required readingpymilvusserver logs to confirm zeroInsert/UpsertRPCs were ever executed, despite memsearch's claim of 219 successful writes.Workaround for users
Until fixed, run
client.flush('memsearch_chunks')manually after everymemsearch indexcall.Filed by a Claude Code user debugging memsearch + remote Milvus integration. Happy to test a patch if you'd like.