Problem
During the Singapore enterprise ApeRAG import on 2026-04-29, vector and fulltext converged for 169 documents, but graph index creation was much slower. At runtime we observed graph worker concurrency at the document level only: with 2 API pods and graph worker concurrency=4 per pod, there were about 7 RUNNING graph documents. However, ACTIVE increased very slowly because each large document processes its chunks sequentially.
Code path:
aperag/indexing/orchestrator.py: run_graph_worker = _entrypoint(Modality.GRAPH, concurrency=4) controls document-level worker concurrency.
aperag/indexing/graph_extractor.py: _extractor(chunks) loops for chunk in chunks and awaits _extract_one_chunk(...) before moving to the next chunk.
- Default per-chunk timeout is 60s, so large PDFs with many chunks can occupy one graph worker for a long time.
Impact
- CPU can stay low while graph completion is slow because work is mostly remote LLM I/O.
- Scaling API pods increases document-level graph worker count, but does not reduce latency for one large document.
- In deployments where API and workers are in the same process, long graph jobs also increase API tail latency/readiness risk.
Suggested fix
Add bounded per-document chunk-level concurrency for graph extraction, configurable in collection.config.knowledge_graph_config or deployment config. For example:
chunk_concurrency default small (e.g. 4) with upper bound.
- Use an
asyncio.Semaphore around per-chunk LLM calls and asyncio.gather(..., return_exceptions=True) while preserving per-chunk failure isolation.
- Keep global/document-level graph worker concurrency separate from chunk-level concurrency to avoid overloading the model provider.
- Add tests that prove multiple chunks can be in flight and that one failed chunk does not fail the whole document.
Related operational finding
This is separate from ApeRAG#1866, where graph compaction logs non-fatal warnings due missing keyword arguments.
Problem
During the Singapore enterprise ApeRAG import on 2026-04-29, vector and fulltext converged for 169 documents, but graph index creation was much slower. At runtime we observed graph worker concurrency at the document level only: with 2 API pods and graph worker concurrency=4 per pod, there were about 7 RUNNING graph documents. However, ACTIVE increased very slowly because each large document processes its chunks sequentially.
Code path:
aperag/indexing/orchestrator.py:run_graph_worker = _entrypoint(Modality.GRAPH, concurrency=4)controls document-level worker concurrency.aperag/indexing/graph_extractor.py:_extractor(chunks)loopsfor chunk in chunksand awaits_extract_one_chunk(...)before moving to the next chunk.Impact
Suggested fix
Add bounded per-document chunk-level concurrency for graph extraction, configurable in
collection.config.knowledge_graph_configor deployment config. For example:chunk_concurrencydefault small (e.g. 4) with upper bound.asyncio.Semaphorearound per-chunk LLM calls andasyncio.gather(..., return_exceptions=True)while preserving per-chunk failure isolation.Related operational finding
This is separate from ApeRAG#1866, where graph compaction logs non-fatal warnings due missing keyword arguments.