Skip to content

perf: post-ingest async embedding overwhelms Ollama under concurrent load #67

@salishforge

Description

@salishforge

Problem

When EMBEDDING_PROVIDER=ollama and multiple concurrent add() calls are made, postIngestAnalysis() fires one embed() call per add(). Under benchmark load (150+ concurrent adds), Ollama's embedding endpoint times out with HeadersTimeoutError.

Root Cause

postIngestAnalysis() is fire-and-forget async. With concurrent ingestion, hundreds of embedding requests hit Ollama simultaneously. The local nomic-embed-text model can only process ~2 req/s, causing cascading timeouts.

Impact

  • Benchmark with per-session storage (CONSOLIDATION_INNER_BATCH_SIZE=1) + embeddings fails under load
  • Individual add() calls still work (the error is non-blocking)
  • Consolidation embedding (embedBatch) also competes for Ollama capacity

Proposed Fix

  1. Add a concurrency limiter on embedding calls (max 2-3 concurrent Ollama requests)
  2. Queue post-ingest analysis instead of fire-and-forget
  3. Or: skip postIngestAnalysis when consolidation will embed anyway (avoid double-embedding)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingperformancePerformance improvements

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions