MilvusStore.upsert() reports success but writes are not durable on remote Milvus 2.5+ (missing flush)

## Summary

`MilvusStore.upsert()` returns `len(chunks)` as a fallback success count even when the underlying upsert produced no durable rows in Milvus. The `memsearch` CLI then reports e.g. `Indexed 219 chunks` while `get_collection_stats()` shows `row_count: 0`. All subsequent searches return empty results because nothing was actually queryable.

The issue surfaces specifically against **remote Milvus 2.5+ standalone**. Milvus-lite (the embedded path) appears to auto-flush, masking the bug.

## Environment

- memsearch: `0.4.2` (latest as of 2026-05-09, installed via `claude plugin install memsearch`)
- Milvus: `milvusdb/milvus:v2.5.4` standalone (Docker, embedded etcd)
- pymilvus: bundled with memsearch
- Client: Windows 11 (uvx installed) connecting to remote Milvus on Linux VM via `milvus.uri = "http://<vm>:19530"`
- Embeddings: `embedding.provider = "onnx"` (all-MiniLM-L6-v2, dim=384)

## Reproduction

1. Run a remote Milvus 2.5.x standalone:
   ```bash
   docker run -d --name milvus -p 19530:19530 -p 9091:9091 \
     -e ETCD_USE_EMBED=true -e ETCD_DATA_DIR=/var/lib/milvus/etcd \
     -e ETCD_CONFIG_PATH=/milvus/configs/embedEtcd.yaml \
     -e COMMON_STORAGETYPE=local \
     -v $PWD/milvus/volumes/milvus:/var/lib/milvus \
     -v $PWD/milvus/embedEtcd.yaml:/milvus/configs/embedEtcd.yaml \
     milvusdb/milvus:v2.5.4 milvus run standalone
   ```

2. Configure memsearch (`~/.memsearch/config.toml`):
   ```toml
   [milvus]
   uri = "http://<vm-ip>:19530"

   [embedding]
   provider = "onnx"
   ```

3. Index a project:
   ```bash
   memsearch index <project-path>
   # Output: Indexed 219 chunks across 47 files (or similar)
   ```

4. Inspect Milvus:
   ```python
   from pymilvus import MilvusClient
   c = MilvusClient(uri='http://<vm-ip>:19530')
   print(c.get_collection_stats('memsearch_chunks'))
   # Actual:   {'row_count': 0}
   # Expected: {'row_count': 219}
   ```

5. Search returns nothing:
   ```bash
   memsearch search "anything"
   # No matches found.
   ```

## Evidence the bug is in memsearch, not Milvus or pymilvus

I ran a direct minimal pymilvus test against the same Milvus instance, with the same schema memsearch uses (BM25 sparse function + 384-dim FLOAT_VECTOR + COSINE + SPARSE_INVERTED_INDEX):

```python
from pymilvus import MilvusClient, DataType, Function, FunctionType
client = MilvusClient(uri='http://<vm>:19530')
# (schema setup — same shape as memsearch._ensure_collection)
client.upsert(collection_name=COL, data=[{
    'chunk_hash': 'test1',
    'embedding': [0.1] * 384,
    'content': 'hello world',
    'file_path': 'test.md',
    'chunk_index': 0,
    'tags': '',
    'metadata': '{}'
}])
client.flush(COL)
print(client.get_collection_stats(COL))
```

Result: `{'row_count': 1}`. **Direct path works.** The minimal change vs. memsearch is the explicit `flush()` call.

## Root cause

`memsearch/store.py:125-137`:

```python
def upsert(self, chunks):
    if not chunks:
        return 0
    result = self._client.upsert(collection_name=self._collection, data=chunks)
    return result.get(\"upsert_count\", len(chunks)) if isinstance(result, dict) else len(chunks)
```

Two issues:

1. **No `flush()` after upsert.** Milvus 2.5+ does not auto-flush in the standalone path. Data sits in the in-memory write buffer and is invisible to queries until a flush, segment seal, or compaction. Older Milvus versions (and milvus-lite) auto-flushed for small writes — masking this gap.

2. **Silent fallback to `len(chunks)`.** If Milvus returns a dict without `upsert_count` (e.g. on the 2.5+ schema-mismatch error path some configs hit), the function still returns the *requested* count as success. `core.py:194` (`return self._store.upsert(records)`) then reports it back to the user. There's no integrity check that `row_count` increased.

## Suggested fix

```python
def upsert(self, chunks):
    if not chunks:
        return 0
    result = self._client.upsert(collection_name=self._collection, data=chunks)
    self._client.flush(self._collection)  # Required for Milvus 2.5+ standalone

    # Trust upsert_count over len(chunks) — fail loud if the server says 0
    actual = result.get(\"upsert_count\") if isinstance(result, dict) else None
    if actual is None:
        actual = len(chunks)
    if actual == 0 and len(chunks) > 0:
        raise RuntimeError(f\"Upsert reported 0 writes for {len(chunks)} chunks (silent failure)\")
    return actual
```

Optional follow-up: a periodic batch flush (every N chunks or on `index` completion) instead of per-call, to amortize the cost on large indexing runs.

## Impact

This is silent data loss from the user's perspective — the CLI reports success, the `~/.memsearch/<project>/chunks/` markdown layer is intact, but the vector index is empty so all searches return nothing. Diagnosis required reading `pymilvus` server logs to confirm zero `Insert`/`Upsert` RPCs were ever executed, despite memsearch's claim of 219 successful writes.

## Workaround for users

Until fixed, run `client.flush('memsearch_chunks')` manually after every `memsearch index` call.

---

Filed by a Claude Code user debugging memsearch + remote Milvus integration. Happy to test a patch if you'd like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MilvusStore.upsert() reports success but writes are not durable on remote Milvus 2.5+ (missing flush) #534

Summary

Environment

Reproduction

Evidence the bug is in memsearch, not Milvus or pymilvus

Root cause

Suggested fix

Impact

Workaround for users

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

MilvusStore.upsert() reports success but writes are not durable on remote Milvus 2.5+ (missing flush) #534

Description

Summary

Environment

Reproduction

Evidence the bug is in memsearch, not Milvus or pymilvus

Root cause

Suggested fix

Impact

Workaround for users

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions