perf: post-ingest async embedding overwhelms Ollama under concurrent load

## Problem
When EMBEDDING_PROVIDER=ollama and multiple concurrent add() calls are made, postIngestAnalysis() fires one embed() call per add(). Under benchmark load (150+ concurrent adds), Ollama's embedding endpoint times out with HeadersTimeoutError.

## Root Cause
postIngestAnalysis() is fire-and-forget async. With concurrent ingestion, hundreds of embedding requests hit Ollama simultaneously. The local nomic-embed-text model can only process ~2 req/s, causing cascading timeouts.

## Impact
- Benchmark with per-session storage (CONSOLIDATION_INNER_BATCH_SIZE=1) + embeddings fails under load
- Individual add() calls still work (the error is non-blocking)
- Consolidation embedding (embedBatch) also competes for Ollama capacity

## Proposed Fix
1. Add a concurrency limiter on embedding calls (max 2-3 concurrent Ollama requests)
2. Queue post-ingest analysis instead of fire-and-forget
3. Or: skip postIngestAnalysis when consolidation will embed anyway (avoid double-embedding)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: post-ingest async embedding overwhelms Ollama under concurrent load #67

Problem

Root Cause

Impact

Proposed Fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

perf: post-ingest async embedding overwhelms Ollama under concurrent load #67

Description

Problem

Root Cause

Impact

Proposed Fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions