Problem
When EMBEDDING_PROVIDER=ollama and multiple concurrent add() calls are made, postIngestAnalysis() fires one embed() call per add(). Under benchmark load (150+ concurrent adds), Ollama's embedding endpoint times out with HeadersTimeoutError.
Root Cause
postIngestAnalysis() is fire-and-forget async. With concurrent ingestion, hundreds of embedding requests hit Ollama simultaneously. The local nomic-embed-text model can only process ~2 req/s, causing cascading timeouts.
Impact
- Benchmark with per-session storage (CONSOLIDATION_INNER_BATCH_SIZE=1) + embeddings fails under load
- Individual add() calls still work (the error is non-blocking)
- Consolidation embedding (embedBatch) also competes for Ollama capacity
Proposed Fix
- Add a concurrency limiter on embedding calls (max 2-3 concurrent Ollama requests)
- Queue post-ingest analysis instead of fire-and-forget
- Or: skip postIngestAnalysis when consolidation will embed anyway (avoid double-embedding)
Problem
When EMBEDDING_PROVIDER=ollama and multiple concurrent add() calls are made, postIngestAnalysis() fires one embed() call per add(). Under benchmark load (150+ concurrent adds), Ollama's embedding endpoint times out with HeadersTimeoutError.
Root Cause
postIngestAnalysis() is fire-and-forget async. With concurrent ingestion, hundreds of embedding requests hit Ollama simultaneously. The local nomic-embed-text model can only process ~2 req/s, causing cascading timeouts.
Impact
Proposed Fix